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Preface 



This volume contains the proceedings of the conference on Computer Aided Veri- 
fication, CAV 2004, held in Boston, Massachusetts, USA, July 13-17, 2004. CAV 
2004 was the 16th in a series of conferences dedicated to the advancement of the 
theory and practice of computer-assisted formal analysis methods for software 
and hardware systems. The conference covers the spectrum from theoretical re- 
sults to concrete applications, with an emphasis on practical verification tools 
and the algorithms and techniques that are needed for their implementation. 
The conference has traditionally drawn participation from researchers as well as 
practitioners in both academia and industry. 

CAV 2004 was colocated with the ACM International Symposium on Soft- 
ware Testing and Analysis (ISSTA 2004), and was partially sponsored by the 
ACM SIGSOFT. The colocation reflects the increasing synergy between research 
on verification and testing, and emerging applications of formal verification to 
software analysis. Joint CAV-ISSTA events included a distinguished plenary lec- 
ture by David Harel and a special session exploring various approaches to the 
specification and verification of concurrent software organized by Shaz Qadeer. 
The program of the conference consisted of 32 regular papers and 16 tool pa- 
pers, selected from a total of 144 submissions. The conference was preceded by 
a tutorial on processor verification, on July 12, 2004, with three invited lectures 
by Randal E. Bryant, David L. Dill and Warren A. Hunt, Jr. The conference 
also included invited talks by Mary Jean Harrold and Thomas W. Reps. Two 
workshops were associated with CAV 2004: 

— BMC 2004: Second International Workshop on Bounded Model Checking 

— GDV 2004: Games in Design and Verification 

We would like to thank all the program committee members and the sub- 
referees for their excellent work in evaluating the submissions and the selection 
of the program. Our thanks also go to the steering committee members and the 
organizers of CAV 2003 for their helpful advice. Special thanks go to George 
Avrunin, the ISSTA General Chair, and Erin Dolan of ACM, for their help with 
the local organization. We would like to acknowledge the generous sponsorship 
of Cadence Design Systems, IBM Research, Intel Research, Microsoft Research 
and the John Von Neumann Minerva Center for Verification of Reactive Systems. 
Their support helped us reduce the registration fee for the graduate students. 

The CAV community is deeply saddened by the sudden demise of Dr. Rob T. 
Gerth. Rob was a highly regarded researcher with many notable contributions 
to the theory and practice of formal verification. For CAV 2004, we had invited 
Rob to join the program committee as well as to give a tutorial on processor 
verification. This LNCS volume containing the CAV 2004 proceedings is dedi- 
cated to Rob, and includes a special presentation, commemorating his scientific 
accomplishments and collaborations. 
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seizoen der dromen" 
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het komt wel voor 
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de angst toeslaat 

men in een leegte staart - 
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dat een man van in de veertig 
met ontstelde ogen toeziet 
hoe de vingers zich in elkaar 
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adem stokt in de keel - 

het komt wel voor 

dat men dan de dood ontmoet 

Wim Gerth 
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Rob Tristan Gerth: 1956—2003 

On Friday November 28, 2003, computer scientist and logician Rob Gerth died 
from sudden cardiac arrest. It is impossible to say all that the loss of Rob means 
to Intel, to the verification community, and to each of us personally. 

As a corporation, Intel has lost a world-renowned authority on formal mod- 
eling and verification. Rob brought an independence of thought and agility of 
mind that are the hallmarks of a rare intellect. He was one of the few people 
with the knowledge and experience to meet the challenges we face in verifying 
micro-architectural protocols; we hope we can find a way to carry on his legacy. 

As a community, we have lost not just a key technical contributor but part of 
our scientific conscience. It was in Rob’s nature to ask “Why?”, even when the 
answer seemed obvious. He brought a commitment to truth and high standards 
that was alternately challenging, frustrating, and inspiring; we will sorely miss it. 

As family, friends and colleagues, we will remember and cherish Rob’s warmth 
and love, his knowledge and openness in sharing it, his dry sense of humor, the 
way only he could read Toon Tellegen stories, and - perhaps most of all - his 
contagious laughter. 

Rob’s passion for knowledge reached far beyond computer science and logic: it 
embraced science, history, literature, art and music. The 2004 Proceedings of the 
16th International Conference on Computer Aided Verification are dedicated to 
him. This article in the Proceedings surveys highlights of Rob’s scientific career, 
followed by excerpts taken from some of the many tributes sent for Rob’s funeral 
by friends and colleagues from around the world. The list of contributors, though 
too long to be printed here, is itself a tribute to Rob’s impact on us all. 

May 2004 John O’Leary & Marly Roncken 

Intel 
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Rob Gerth, Program Verifier par Excellence: His Research 

Going through Rob Gerth’s contributions to program semantics and verification, 
and tools for the latter, one is struck by his almost incredible versatility, total 
commitment, and scientific depth. 

Starting in 1981 with a proof system for Brinch Hansen’s monitor-based 
language Distributed Processes [1], he followed with an important paper on 
“Transition Logic: How to Reason about Temporal Properties in a Gompo- 
sitional Way” [2], coauthored papers on “Gompositional Semantics for Real- 
Time Distributed Gomputing” for the predecessor of LIGS [3], with its Full- 
Abstraction proof [4], a (sound and relatively complete) “Proof System for 
Goncurrent Ada Programs” in Science of Computer Programming [5], and a 
far reaching paper in 1986 “Monitors Revisited: A First Step towards Ver- 
ifying Object-Oriented Systems” [6], far before the general research pub- 
lic in computer science realized the importance of concepts like composi- 
tionality, formalized real-time semantics, or relatively complete proof systems 
for monitor-based and object-oriented languages. 

This list continues to be extended in the same break-through baffling fashion: 
Goauthor of full-abstract semantics for Statecharts and concurrent Prolog [7, 8] , 
“Rooting UNITY” [9], his Program Refinement and Self-Modifying Gode [10, 11], 
coauthored papers for GONGUR [12,13] or Reliable Distributed Systems [14]. 
Who else in those years cared for investigating fully abstract program semantics, 
verifying fault-tolerant distributed programs, self-modifying programs, interface 
refinement, and the now crucial concept of compositional refinement? 

Having reached the limits of what can reasonably be verified in principle by 
hand, Rob saw that the time of automated verification had arrived and devoted 
himself with the same ruthless interest and intrepidity to developing foundations 
of abstract interpretation, model checking and automated refinement tools - to 
allow applications at industrial scale. The focus in his work now shifts to handling 
state explosion in (real-time) model checking. 

To this end, he developed partial-order techniques [15, 16] and other advanced 
techniques like on-the-fly model checking [17] and abstraction [18] - the latter 
with his “very own” Ph.D. student Dennis Dams and with Orna Grumberg. 
The foundational aspects of abstract interpretation in model checking were first 
presented at a memorable PROGOMET workshop in San Miniato [19], and 
culminated in 1997 in a TOPLAS seminal paper [20] and application paper [21]. 

At the end of this period, Rob organised the ESPRIT project VIRES (Ver- 
ifying Industrial Reactive Systems) where the developed theory and tools were 
applied to the verification of an industrial bus protocol. This project was his 
legacy to Europe, and the beginning of a new era in the verification of large 
systems. When Rob left the Technical University Eindhoven for Intel at the end 
of 1997, VIRES found a suitable new project lead in his former Ph.D student 
Dennis Dams. 

Rob’s “warming up” exercise at Intel was the ambitious exploratory project 
of formally verifying arithmetic hardware units to their IEEE specifications [22] . 
The technology and methodology developed in that project are used routinely 
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today to verify floating-point functionality in Intel’s Microprocessor design 
projects. 

He subsequently went back to a more familiar domain, developing a linear- 
time temporal logic - dubbed ForSpec [23] . The uniqueness of Forspec lies in its 
many special features for hardware specification, motivated by many Intel en- 
gineering years of both edge-cutting and practical design and validation. Rob’s 
unique blend of theoretical knowledge and practical experience in formal se- 
mantics and model checking invariably prevented inconsistencies and efficiency 
hazards, and helped solve the most difficult roadblocks. ForSpec is widely used 
in Intel, and has been incorporated in OpenVera"’"'^ , a language for hardware 
testbenches and assertions that is supported by a broad network of CAD com- 
panies. 

Just weeks before his death, Rob presented the first results of what he called 
his “bit-vector compiler” - software he had been developing to relate protocol 
models over abstract data types to ditto models over “hardware” datatypes, i.e. 
bits and bit- vectors. Following up on [24], the compiler is part of Rob’s protocol 
verification programme and perhaps the most cherished and ambitious project 
of his life. A life which he did not anticipate would end any time soon. 

If Edsger Wiebe Dijkstra, another Dutchman, opened the eyes of computer 
scientists for the value of proving programs correct, then in Rob Tristan Gerth a 
worthy modern-day successor and polymath was incarnated, whose own unique 
vision furthered the feasibility of Dijkstra’s programme. We shall all miss Rob’s 
enthusiasm, ruthless intellectual curiosity, and relentless energy. 
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Rob Gerth: Personal Recollections 

I had the pleasure of knowing Rob for over 10 years. He first appeared in the 
department of Computer Science at the Technion when I was doing a Phd, for 
a visit in Israel, while he collaborated with Liuba Shrira. 

When I finished the Phd, during my work at Bell Labs, Rob invited me several 
times to Eindhoven. We worked on several topics, and published together two 
papers. We worked with Ruurd Kuiper, Wojciech Penczek, Moshe Vardi and 
Pierre Wolper. It was always a lot of fun and great inspiration. 

We all know these people that we try to communicate with about our re- 
search, and they are busy and postpone their reply indefinitely, so after some 
time we give up and never hear from them again. Rob was the striking exception 
to that: it took me some time to understand that I can always and fully trust 
that even if we did not achieve a fully published results on a visit, there will 
be one day, where Rob is going to send some really interesting new and original 
thread of ideas. I learned that I do not need to remind him. He is just thinking 
of it, and when he realizes he has something substantial to say, he will. Rob was 
a perfectionist. We worked on a third subject, and decided the results are not so 
great. Rob would never send a paper unless he thinks it’s an important result 
and the writing is clear. 

I was looking forward to meet Rob again soon. We talked about me visiting 
Intel in December. I also waited for him to clear his invited talk about the 
verification of the new Intel processor in the CAV 2004 conference in Boston 
with his management. I would have reminded anyone else, but with Rob I knew 
he is taking care of things. Rob has influenced the CAV community with his 
important contributions. We will remember him. 

Doron Peled 
University of Warwick 
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It is with great pain that I struggle with the unacceptably tragic news of Rob’s 
death. For me, Rob always signified the spirit of youth and intellectual rebellion 
- uncompromising insistence on finding out things for himself and establishing 
his own truth in the way that would satisfy him personally. This independence 
of thought and idealistically high standards often led him in paths infrequently 
traveled achieving many original, ingenious, and fresh insights that were Rob’s 
unique trade mark in science. 

It is extremely difficult to reconcile oneself with the thought that this youthful 
and energetically fresh creative person has been cut in mid stream. The loss to 
science, industry, and each of us personally, is too great to encompass in few 
words. 

Amir Pnueli 

New York University & Weizmann Institute 



I have known and shared many happy times with Rob. He and I shared a grant 
from the EU for many years, which let us eat, drink, and (sometimes) work 
around Europe. Most of all I remember fierce arguments, highlighted by the 
intensity of Rob’s in-your-face burning eyes. These were all totally enjoyable, 
and happily resolved over a more-than-few drinks. 



Mike Reed 
Oxford University 




Fig. 1. Rob, Amir Pnueli and Mike Reed in Lontro, Crete, 1995. 
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Remembering Rob Gerth, I am deeply saddened by the loss of a real friend. 
He was my first Ph.D. student. The difference between him as supervisee and 
me, his supervisor, quickly evaporated because of his rare, incisive intellect, 
his passionate devotion to our field - program verification - his integrity, his 
admirable courage, and his honesty. He was a man to steal horses with. 

He surpassed me scientifically shortly after finishing his thesis. Both of us 
knew that, and for me this was a reason for pride and enjoyment - The Torch 
of Scientific Research had been passed on to a Worthy Successor. 

Rob, Marly Roncken, Niek van Diepen, and I met in the beginning of the 
1980’s as part of an informal research group in program verification at the Uni- 
versity of Utrecht. We had a proof theory for Brinch Hansen’s monitor-based 
language Distributed Processes as goal in mind. Marly contributed our first rep- 
resentative example, her Monkey-Bananas puzzle, and Rob observed that its 
correctness proof involved Owicki’s interference-freedom test. These efforts led 
eventually to our first joint publication in 1982. 

This goal of a proof theory for monitors subsequently evolved into the sub- 
ject of Rob’s thesis, finished after my group had already moved to Eindhoven. 
But Rob wanted to obtain his Ph.D. degree at “his own” University, that of 
Utrecht. Jan van Leeuwen made this possible. I still remember sitting along one 
of Utrecht’s beautiful medieval canals after the ceremony, with my wife Corinne 
and our daughter Jojanneke, trying to convince Rob that a meal for his doc- 
toral committee and his guests would be the appropriate end to a wonderful day. 
Marly convinced him in the end, and I finally noticed they had become a real 
couple. 




Fig. 2. Jan van Leeuwen handing over Rob’s Ph.D. certihcate, 8 May 1989 (left). After 
the ceremony (right), Rob exchanged his tails for a T-shirt version - a present from 
WP in response to Rob’s (reluctant) yielding to the University’s ceremonial dress code. 
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In the meantime we had started participating in the ESPRIT research pro- 
gramme of the European Commission. This was made possible by an eminent 
team of research students, consisting of Rob, Ruurd Kuiper, Kees Huizing, Ron 
Koymans, and Jozef Hooman at Eindhoven, and Frank Stomp and Job Zwiers 
at Nijmegen University. Amir Pnueli commented later that they constituted the 
most gifted single group of Ph.D. students he had ever collaborated with. 

In my second EU research project Rob became Eindhoven’s team leader. 
Those projects led, among others, to the birth of the now dominant CAV series 
of conferences on Computer-Aided Verification. Frequent guests of the chair were 
Allen Emerson, Amir Pnueli, Joseph Sifakis, and many others who were to lift 
program verification later to its current level. Within a couple of years Rob Gerth 
had evolved from a proud citizen of Utrecht to a researcher having a world-wide 
view of program verification with an established scientific reputation, and had 
changed his main line of research to computer-aided verification. 

His parting gift to Eindhoven University was the ESPRIT project VIRES, the 
proposal of which he wrote single-handedly, on the verification of a mobile phone 
protocol. That project was subsequently led at Eindhoven University by his first 
Ph.D. student, Dennis Dams. It constitutes still a topic for invited lectures! 

I had moved in the meantime to the University of Kiel, Germany, Rob moved 
later to Intel, and Dennis Dams to Bell-Labs. An era of scientific excellence and 
vision in program verification in the Netherlands had come to an end. 

Willem-Paul de Roever 

~ University of Kiel 




Fig. 3. Two Ph.D. generations: Dennis Dams with Rob at the PROCOMET workshop 
in Italy, 1994 (left), and Rob with Willem-Paul, Corinne and Sanne de Roever and the 
Shyamasundar family in Utrecht, 1983 (right). 
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I had not seen Rob for two years, but when I got the message of his death he 
was suddenly there again, in a long sequence of memories. The many years in 
Eindhoven, first as M.Sc. advisor, then co-promotor of my PhD thesis. In that 
sense he was my “scientific father” , a role that he fulfilled with dedication. 

Thinking back, what I found most characteristic about Rob was his calmness. 
Always was he willing, and took ample time, to listen patiently to what I had 
to tell him. We discussed long, and often late hours about work and many other 
things. He could continue working on a joint paper, unstirred, until just before, 
or after the submission deadline. And I remember how relieved I was when he 
finally showed up on the platform escalator of Eindhoven station, half a minute 
before the departure of our international train. His face showing only a slight 
surprise about my state of panic. 

With that self-evident calmness he allowed me to continue working on my 
PhD thesis, even well beyond the regulatory four years. I do not actually re- 
member whether I ever told him how grateful I am for that. 

Between 1989 and 1997 we have done many things together: working, eating, 
running, traveling. Memories that we talked about together, not so long ago. 
Memories that now suddenly take a different role. The PROCOMET workshop 
in San Miniato. That restaurant in Haifa where we had dinner each night during 
our working visit to the Technion. Those memories now turn into memories of 
Rob himself. 



Dennis Dams 
Bell Labs 

I am deeply grateful for having had the privilege to work with Rob for the past 
five years. Rob was always one of the first people I consulted whenever I had 
questions on a scientific topic. He was one of the very first people who recognized 
the importance and significance of the GSTE work, and even encouraged me on 
many occasions to write a book on the subject. I remember that he carefully 
read the first draft of the theory paper, and walked into my cubicle one day with 
the copy full of his valuable comments . . . and a coffee stain. He apologized for 
the mess, but I thanked him and told him jokingly that I would like to keep it as 
it would become a great souvenir when he’d be a very famous computer scientist 
some day. I have kept the copy and I will keep it forever as a memory of Rob. 

Recently, Rob and I worked closely on mentoring the protocol verification 
project by Professor Lin in China. He was very enthusiastic and upbeat about it. 
I still remember the words he said when we discussed how to steer this project. 
With his unforgettable smile and an emphatic “No”, he said “Jin, we should 
define the problem very clearly, but we cannot constrain how he is going to solve 
it” . How could one argue with that? 



Jin Yang 
Intel 
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Rob, and Marly almost equally long, I have known since their post-university 
days at the University of Utrecht. We were coached, quite independently and at 
different places, by Willem-Paul de Roever. Initially with him, Rob and I at a 
later stage spent a period of about 10 years at Eindhoven Technical University, 
Rob as leader and later initiator of projects with European Community funding. 

Workwise, that meant interaction on a day-to-day basis, where Rob’s scien- 
tific perceptiveness, and his perseverance and integrity were a constant pleasure 
and inspiration. Spotting a new idea like a vague shadow of a fish, but then 
also catching it - making it precise and finding the appropriate formalism to 
describe it; accomplished artist and craftsman both. To honour Rob’s love for 
books: II Miglior Fabbro - certainly. Socially it meant isolated, few, pinpointed, 
but intense and happy interactions. Like triggered when our children were born 
(there still lives a toy-frog in our bathtub donated by Rob and Marly, though 
politely named after Rob), or when the mood was right, tired-out after work on 
a project assignment at the Weizmann Institute of Science in Israel. 

This mix felt absolutely right. To me Rob was, apart from the driven, talented 
scientist, an unusually complete human being, in his tastes as well as in his 
priorities. Leaving a cinema late at night I once saw Rob and Marly cycling 
around rather hollow-eyed: having worked together on some paper, they had 
forgotten about time and food. But at that moment work was far away; their 
being together radiated happiness and was a joy to look at. 

A furtive try to phrase what Rob means to me: 

A good thing about a friend is that you understand and value him; the better 
thing is that he understands and values you - Rob did. The book “Pallieter” 
of which he once gave me a fine old copy (hunting down one, after noticing 
that I admired his) will always be on my shelf, as Rob will remain in my life. 
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Fig. 4. Jozef Hooman, Rob, Ruurd Kuiper, Kees Huizing, and Ron Koymans, Israel. 



Rob was my co-promotor and we worked together at the Eindhoven University. 
He was also my daily supervisor during my Ph.D. work and I learned so many 
things from him. About English writing, for instance. Maybe also a few bad 
habits, but what I remember most is his attitude to science. He combined strong 
opinions with a very open mind, which is exceptional, I think. Sometimes, when I 
had made a nice exposition of some point of view, he could ask with his eyes wide 
open in what seemed true surprise: “Why?” . This single word made me rethink 
my whole position and suddenly every assumption I made seemed questionable. 
Although he had a very broad knowledge of computer science, he managed to 
approach the research field every time in a fresh and original way. 

His original and sharp thinking also lightened up our coffee break discussions 
and now that he died I realise that I still miss him, even after these years. And 
not only for his conversation, but also for his independent mind, whether it be 
science, politics, or conventions and manners. 

Kees Huizing 
Technical University Eindhoven 

I have written two papers with Rob. He was a challenging collaborator. He took 
nothing for granted and challenged everything. Working with him was like a 
furious tennis match. At times, you could not tell whether it was collaborative 
or competitive, it was so intense. When the end result finally emerged, it was 
always of superb quality, having been polished by clashing intellects. Rob’s un- 
compromising quest for scientific truth forced his collaborators to bring forth 
their best ideas. His untimely departure leaves us with a gaping hole. 

Moshe Vardi 
Rice University 



12 Rob Tristan Gerth: 1956-2003 

When I first met Rob in 1987 he was a student at TU-Eindhoven. All of Willem- 
Paul’s group were impressive, but Rob stood out somehow. His brilliance and 
intensity were so plainly visible. He wasn’t like a student. Instead, he already 
had the maturity and judgement of an excellent junior faculty member. 

I followed Rob’s career with interest. He brought much needed clarity and 
illumination to everything he worked on. I myself was doubly impressed with 
Intel’s seriousness regarding formal verification when I learned they had hired 
Rob. 

Rob was both a gentlemen and a scientist. I myself, his friends and colleagues, 
and the community will miss him enormously. 



Allen Emerson 
University of Texas 



I was shocked to hear the sad news about Rob. Rob was a well respected member 
of our community - an international community of researchers working on spec- 
ification and verification. I met him frequently at workshops and conferences. 

Rob’s intelligence was obvious, and he contributed a great deal to the progress 
of our field. But what impressed me most was his good sense and openness to 
new ideas. He was free of preconceived notions of how things should be done, 
and he was always willing to listen to what others had to say. 



Leslie Lamport 
Microsoft 



Rob was very kind and welcoming to me whenever I visited Intel. Although we 
never worked on a project together, I had a tremendous respect for his scientific 
knowledge and often popped by his desk to consult him on tricky technical 
points. Rob was a generous man and always willing to share his time. 

Rob also had a wonderful sense of humor and was a simulating and enjoyable 
lunch-time companion. I remember well how his face would light up with a smile, 
as he would agree with a gently ironical “of course” to some absurdity or other 
we were discussing. 

In his work, I think what characterised Rob was his tremendous dedication 
to really getting to the bottom of difficult scientific questions. He wasn’t content 
just to have an approximate understanding of a problem and its solution. He 
wanted to really know - and as a first-class computer scientist he had the ability 
to get there too. 



Tom Melham 
University of Oxford 
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Rob had a remarkable mind. He had a deep understanding of computer science 
theory, particularly automata theory. He explained many concepts to me and 
to others on my team. Yet, he was never condescending. He enjoyed sharing his 
knowledge, and he didn’t give up on an explanation until it was clear. 

I knew that Rob had many interests outside of work, but sadly I didn’t learn 
the full scope of those interests until after his death. I have come to understand 
that Rob was a true “Renaissance Man”. He had a library with thousands of 
books and CDs, and the home he shared with Marly is filled with art and sculp- 
ture from around the world. I’m amazed at the breadth and depth of his interests 
and knowledge. 

Rob had a subtle, wry sense of humor. He was often amused by my (usually 
futile) attempts to effect change despite the bureaucracy inherent in a large 
company like Intel. After one of my skirmishes with the powers-that-be, Rob 
presented me with a special award-something that he had invented just for the 
occasion. It was the inaugural Strategic CAD Labs “Don Quixote” award: a 
small porcelain Dutch windmill. He told me that tilting at that windmill would 
often be more productive than “tilting” at bureaucracy. 



Robert Jones 
Intel 
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Long before I met him, I actually tried to ignore Rob. Everywhere I went, people 
were raving about how smart he was, what a great guy he was, etc. At some 
level, I guess I was jealous, and I arrogantly said to myself, “If this Rob Gerth 
guy is such a hot shot in verification, why haven’t I met him yet?” I should have 
heeded my own advice: you can’t tell the difference between someone who is two 
steps ahead of you and someone who is two steps behind you. When someone 
is a bit ahead of you, you can understand what they’re doing and why, and you 
acknowledge that they’re ahead; similarly, when someone is a bit behind you, 
you understand exactly what they’re missing and how they are behind you. But 
when someone is too far ahead, what they say and do is incomprehensible, and 
thereby becomes indistinguishable from gibberish. For me and Rob, it took me a 
lot of hard work and tough thinking before I reached the point that I could start 
to appreciate his comments and insights, and work he had done long before. 
Let me give a few concrete examples: 

• I’ve long considered the verification of multiprocessor memory systems to be 
one of my specialties. This was how I got started in verification, and I’ve 
done quite a bit of work in this area. Only in the past few years, though, 
have I started looking at verifying the conformance of the system to a formal 
memory model, rather than verifying simple ad hoc assertions. As I moved 
into this “next step” in verification, I found that Rob had been there already. 
In fact, he had already edited an entire journal special issue on this topic! 

• I have always ignored work on strange automata variations as interesting, 
but irrelevant theory. When I visited Intel SCL to work on the eminently 
practical GSTE, Rob remarked, “Oh, it’s basically forall-automata. I’m more 
interested in alternating automata.” After weeks of concentrating on GSTE, 
I eventually came to realize, “Oh yeah, I guess it is a forall-automaton.” 
And I’ve started seeing alternating automata (the generalization of forall) 
popping up everywhere in my own research and that of others. 

• The other comment Rob made to me about GSTE was that it was “using a 
graph to structure fixed-point computations” . At the time, I understood his 
comment at a syntactic level: GSTE does indeed use a graph, and various 
fixed-point computations are done in accordance to the graph. Only after a 
year and a half am I starting to see that there was more to his comment, that 
there is a general principle, that there might be promising ways to generalize 
and exploit that general principle. I still don’t think I can see what he saw, 
but I’m starting to catch a glimpse of his ideas. 

Given how far ahead Rob’s thinking and research has been, I take solace in 
knowing that he will continue to teach me things for years, perhaps decades, to 
come. I expect that, with enough hard work and hard thinking. I’ll yet again 
rediscover something of Rob’s, and I’ll be able to see him, with that it’s-really- 
very-obvious-if-you-think-about-it look on his face, showing me the way forward 
when I’m finally ready to follow him. 

Alan Hu 

University of British Golumbia 
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Abstract. This paper reviews the principles behind the paradigm of “abstract 
interpretation via 3-valued logic,” discusses recent work to extend the approach, 
and summarizes on-going research aimed at overcoming remaining limitations 
on the ability to create program-analysis algorithms fully automatically. 



1 Introduction 

Static analysis concerns techniques for obtaining information about the possible states 
that a program passes through during execution, without actually running the program 
on specific inputs. Instead, static-analysis techniques explore a program’s behavior for 
all possible inputs and all possible states that the program can reach. To make this 
feasible, the program is “run in the aggregate” - i.e., on abstract descriptors that repre- 
sent collections of many states. In the last few years, researchers have made important 
advances in applying static analysis in new kinds of program-analysis tools for identi- 
fying bugs and security vulnerabilities [1-7]. In these tools, static analysis provides a 
way in which properties of a program’s behavior can be verified (or, alternatively, ways 
in which bugs and security vulnerabilities can be detected). Static analysis is used to 
provide a safe answer to the question “Can the program reach a bad state?” 

Despite these successes, substantial challenges still remain. In particular, pointers 
and dynamically-allocated storage are features of all modern imperative programming 
languages, but their use is error-prone: 

• Dereferencing NULL-valued pointers and accessing previously deallocated stor- 
age are two common programming mistakes. 

• Pointers and dynamically-allocated storage allow a program to build up complex 
graph data structures. Even when some simple data structure is intended, one or 
more incorrect assignments to pointers, or indirect assignments through pointers, 
can cause bugs whose symptoms are hard to diagnose. 

Because tools for finding bugs and detecting security vulnerabilities need answers 
to questions about pointer variables, their contents, and the structure of the heap\ the 
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usage of pointers in programs is a major obstacle to the goal of addressing software 
reliability by means of static analysis. In particular, the effects of assignments through 
pointer variables and pointer- valued helds make it hard to determine the aliasing rela- 
tionships among different pointer expressions in a program. When less precise pointer 
information is available, the effectiveness of static techniques decreases. 

Although much work has been done on algorithms for flow-insensitive points-to 
analysis [8-10] (including algorithms that exhibit varying degrees of context- sensitivity 
[11-15]), all of this work uses a very simple abstraction of heap-allocated storage: All 
nodes allocated at site s are folded together into a single summary node Ua- Such 
an approach has rather severe consequences for precision. If allocation site s is in a 
loop, or in a function that is called more than once, then s can allocate multiple nodes 
with different addresses. A points-to fact “p points to ris” means that program vari- 
able p may point to one of the nodes that ris represents. For an assignment of the form 
p- >selectorl = q, points-to-analysis algorithms are ordinarily forced to perform 
a “weak update”: that is, selector edges emanating from the nodes that p points to are 
accumulated', the abstract execution of an assignment to a field of a summary node can- 
not “kill” the effects of a previous assignment because, in general, only one of the nodes 
that ris represents is updated on each concrete execution of the assignment statement. 

Such imprecisions snowball as additional weak updates are performed (e.g., for 
assignment statements of the form r->selector2 = p- >selectorl), and the 
use of a flow-insensitive algorithm exacerbates the problem. Consequently, most of the 
literature on points-to analysis leads to almost no useful information about the structure 
of the heap. One study [16] of the characteristics of the results obtained using one of 
the flow-insensitive points-to-analysis algorithms reports that 

Our experiments show that in every points-to graph, there is a single node (the 

“blob”) that has a large number of outgoing flow edges. In every graph, the 

blob has an order of magnitude more outgoing edges than any other node. 

Such imprecision, in turn, leads to overly pessimistic assessments of the program’s 
behavior. Moreover, most of the representations of pointer relationships that have been 
proposed in the literature on points-to analysis cannot express even as simple a fact 
as “x points to an acyclic list”. Such representations are unable to confirm behavioral 
properties, such as (i) when the input to a list-insert program is an acyclic list, the output 
is an acyclic list, and (ii) when the input to a list-reversal program that uses destructive- 
update operations is an acyclic list, the output is an acyclic list. Instead, most points-to- 
analysis algorithms will report that a possibly cyclic structure can arise. For programs 
that use two lists, most points-to-analysis algorithms will report that at the end of the 
program the two lists might share list elements (even when, in fact, the two lists must 
always remain disjoint). 

The failings of conventional pointer-analysis algorithms discussed above are just 
symptomatic of a more general problem: in general, tools for finding bugs and detecting 
security vulnerabilities need answers to questions about a wide variety of behavioral 
properties; these questions can only be answered by tracking relationships among a 
program’s runtime entities, and in general the number of such entities has no hxed 
upper bound. Moreover, the nature of the relationships that need to be tracked depends 
on both the program being analyzed and the queries to be answered. 
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The aim of our work [17] has been to create a parametric framework for program 
analysis that addresses these issues. A parametric framework is one that can be instan- 
tiated in different ways to create different program- analysis algorithms that provide an- 
swers to different questions, with varying degrees of efficiency and precision. The key 
aspect of our approach is the way in which it makes use of 2-valued and 3-valued logic: 
2-valued and 3-valued logical structures - i.e., collections of predicates - are used to 
represent concrete and abstract stores, respectively; individuals represent entities such 
as memory cells, threads, locks, etc.; unary and binary predicates encode the contents of 
variables, pointer-valued structure fields, and other aspects of memory states; and first- 
order formulas with transitive closure are used to specify properties such as sharing, 
cyclicity, reachability, etc. Formulas are also used to specify how the store is affected 
by the execution of the different kinds of statements in the programming language. 

The analysis framework can be instantiated in different ways by varying the predi- 
cate symbols of the logic, and, in particular, by varying which of the unary predicates 
control how nodes are folded together (this is explained in more detail in Sect. 2). The 
specified set of predicates determines the set of properties that will be tracked, and con- 
sequently what properties of stores can be discovered to hold at different points in the 
program by the corresponding instance of the analysis. 

As a methodology for verifying properties of programs, the advantages of the 3- 
valued-logic approach are: 

1. No loop invariants are required. 

2. No theorem provers are involved, and thus every abstract execution step must 
terminate. 

3. The method is based on abstract interpretation [18], and satisfies conditions that 
guarantee that the entire process always terminates. 

4. The method applies to programs that manipulate pointers and heap-allocated data 
structures. Moreover, analyses are capable of performing strong updates during 
the abstract execution of an assignment to a pointer-valued field. 

5. The method eliminates the need for the user to write the usual proofs required 
with abstract interpretation - i.e., to demonstrate that the abstract descriptors that 
the analyzer manipulates correctly model the actual heap-allocated data structures 
that the program manipulates. 

A prototype implementation that implements this approach has been created, called 
TVLA (Three- Valued-Logic Analyzer) [19,20]. 

Points (1) and (2) may seem counterintuitive, given that we work with an undecid- 
able logic (first-order logic plus transitive closure - see footnote 4), but they are really 
properties shared by any verification method that is based on abstract interpretation, and 
hence are consequences of point (3). Points (4) and (5) may be equally surprising - even 
to many experts in the field of static analysis - but are key aspects of this approach: 

• Point (4) has a fundamental effect on precision. In particular, our approach is ca- 
pable of confirming the behavioral properties mentioned earlier, i.e., (i) when the 
input to a list-insert program is an acyclic list, the output is an acyclic list, and (ii) 
when the input to a list-reversal program that uses destructive-update operations 
is an acyclic list, the output is an acyclic list. In addition, when a program uses 
multiple lists that always remain disjoint, our approach can often confirm that fact. 
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Table 1. (a) Declaration of a linked-list datatype in C. (b) Core predicates used for representing 
the stores manipulated by programs that use type List. (We write predicate names in italics and 
code in typewriter font.) 



typedef struct node { 
int data; 
struct node *n; 

} *List; 

(a) 



Predicate 


Intended Meaning 


eq(vi,V 2 ) 

q{v) 

n{v\,V 2 ) 

dle(v\,V 2 ) 


Do v\ and V2 denote the same memory cell? 

Does pointer variable q point to memory cell vl 
Does the n-field of v\ point to 112? 

Is the data-field of v\ less than or equal to that of U2? 



(b) 



• Point (5) is one of the keys for making the approach accessible for users. With 
the methodology of abstract interpretation, it is often a difficult task to obtain an 
appropriate abstract semantics; abstract-interpretation papers often contain com- 
plicated proofs to show that a given abstract semantics is sound with respect to 
a given concrete semantics. With our approach, this is not the case: the abstract 
semantics falls out automatically from a specification of the concrete semantics 
(which has to be provided in any case whenever abstract interpretation is em- 
ployed); the soundness of all instantiations of the framework follows from a single 
meta-theorem ([17, Theorem 4.9]). 

The remainder of the paper is organized as follows: Sect. 2 summarizes the frame- 
work for static analysis from [17]. Sect. 3 describes several applications and extensions. 
Sect. 4 discusses related work. 



2 The Use of Logic for Program Analysis 

Modeling and Abstracting the Heap 
with Logical Structures. In the static- 
analysis framework defined in [17], con- 
crete memory configurations - or stores - 
are modeled by logical structures. A log- 
ical structure is associated with a vocabu- 
lary of predicate symbols (with given ar- 
ities): V = {eq,pi, . . . ,pn} is afinite set 
of predicate symbols, where Vk denotes 
the set of predicate symbols of arity k (and eq € 7^2)- A logical structure supplies a 
predicate for each of the vocabulary’s predicate symbols. A concrete store is modeled 
by a 2-valued logical structure for a fixed vocabulary C of core predicates. Core pred- 
icates are part of the underlying semantics of the language to be analyzed; they record 
atomic properties of stores. Tab. 1 gives the definition of a C linked-list datatype, and 
lists the predicates that would be used to represent the stores manipulated by programs 
that use type List, such as the store shown in Fig. 1. 2-valued logical structures then 
represent memory configurations: the individuals are the set of memory cells; a nullary 
predicate represents a Boolean variable of the program; a unary predicate represents 



^ 5 I 9 I null 

y 

Fig. 1. A possible store, consisting of a four- 
node linked list pointed to by x and y. 
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either a pointer variable or a Boolean-valued field of a record; and a binary predicate 
represents a pointer field of a record^. 

The 2-valued structure S, shown in the upper-left-hand corner of Fig. 2, encodes the 
store of Fig. 1. S’s four individuals, ui, U 2 , U 3 , and U 4 , represent the four list cells. 

The following graphical notation is used for depicting 2-valued logical structures: 

• An individual is represented by a circle with its name inside. 

• A unary predicate p is represented by having a solid arrow from p to each individ- 
ual u for which p{u) = 1, and by the absence of a p-arrow to each individual u' 
for which p{u') = 0. (If predicate p is 0 for all individuals, p is not shown.) 

• A binary predicate q is represented by a solid arrow labeled q between each pair 
of individuals Ui and Uj for which q{ui, uj) = 1, and by the absence of a g-arrow 
between pairs m- and u' for which q{u[, m' ) = 0. 

Thus, in structure S, pointer variables x and y point to individual ui, whose n-field 
points to individual U 2 ; pointer variables t and e do not point to any individual. 

Often we only want to use a restricted class of logical structures to encode stores. 
To exclude structures that do not represent admissible stores, integrity constraints can 
be imposed. For instance, the predicate x{v) of Fig. 2 captures whether pointer variable 
X points to memory cell u; x would be given the attribute “unique”, which imposes the 
integrity constraint that x{v) can hold for at most one individual in any structure. 

The concrete operational semantics of a programming language is defined by spec- 
ifying a structure transformer for each kind of edge that can appear in a control-flow 
graph. Formally, the structure transformer Te for edge e is defined using a collection of 
predicate-update formulas, c{vi, . . . , Vk) = Tc,e(vi, ■ ■ ■ , Vk), one for each core predi- 
cate c G Vk (s-g-, see [17]). These define how the core predicates of a logical structure 
S that arises at the source of e are transformed to create structure S' at the target of e; 
they define the value of predicate c in S" - denoted by c' in the update formulas of Fig. 2 
- as a function of predicates in S. Edge e may optionally have a precondition formula, 
which filters out structures that should not follow the transition along e. 

Canonical Abstraction. To create abstractions of 2-valued logical structures (and 
hence of the stores that they encode), we use the related class of 3-valued logical struc- 
tures over the same vocabulary. In 3-valued logical structures, a third truth value, de- 
noted by 1 /2, is introduced to denote uncertainty: in a 3-valued logical structure, the 
value p{u) of predicate p on a tuple of individuals u is allowed to be 1 /2. 

Definition 1. The truth values 0 and 1 are definite values', 1/2 is an indefinite value. 
For li,l 2 G {0,1/2,!}, the information order is defined as follows: Zi C (2 iff (1 = (2 
or I 2 = 1/2. The symbol U denotes the least-upper-bound operation with respect to C. 



^ To simplify matters, our examples do not involve modeling numeric-valued variables and 
numeric-valued fields (such as data). It is possible to do this by introducing other predi- 
cates, such as the binary predicate die (which stands for “data less-than-or-equal-to”) listed 
in Tah. 1; die captures the relative order of two nodes’ data values. Alternatively, numeric- 
valued entities can be handled by combining abstractions of logical stmctures with previously 
known techniques for creating numeric abstractions [21]. 




20 



Thomas W. Reps, Mooly Sagiv, and Reinhard Wilhelm 




Fig. 2. The top row illustrates the abstraction of 2-valued structure S to 3-valued structure T with {x, y, t, e}-abstraction. The boxes in the tables of unary 
predicates indicate how individuals are grouped into equivalence classes; the boxes in the tables for n and eq indicate how the “truth-blurring quotients” 
are performed. The commutative diagram illustrates the relationship between (i) the transformation on 2-valued structures (defined by predicate-update 
formulas) for the concrete semantics of y = y->n, (ii) abstraction, and (iii) a sound abstract semantics for y = y->n that is obtained by using the 
same predicate-update formulas to transform 3- valued structures. 
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The abstract stores used for program analysis are 3 -valued logical structures that, by 
the construction discussed below, are a priori of bounded size. In general, each 3 -valued 
logical structure corresponds to a (possibly infinite) set of 2-valued logical structures. 
Members of these two families of structures are related by canonical abstraction. 

The principle behind canonical abstraction is illustrated in the top and bottom rows 
of Fig. 2 , which show how 2 -valued structures S and S' are abstracted to 3 -valued 
structures T and T" , respectively. The abstraction function is determined by a subset A 
of the unary predicates. The predicates in A are called the abstraction predicates. Given 
A, the act of applying the corresponding abstraction function is called A-abstraction. 
The canonical abstraction illustrated in Fig. 2 is {a;, y, t, e}-abstraction. 

Abstraction is driven by the values of the “vector” of abstraction predicates for 
each individual w - i.e., for S, by the values x{w), y{w), t{w) and e{w), for w G 
{u\, U2, Us, U4} - and, in particular, by the equivalence classes formed from the indi- 
viduals that have the same vector of values for their abstraction predicates. In S, there 
are two such equivalence classes: (i) {mi}, for which x, y, t, and e are 1, 1, 0, and 
0 , respectively, and (ii) {u2, M3, M4}, for which x, y, t, and e are all 0 . (The boxes in 
the table of unary predicates for S show how individuals of S are grouped into two 
equivalence classes.) All of the members of each equivalence class are mapped to the 
same individual of the 3 -valued structure. Thus, all members of {m2, M3, M4} from S are 
mapped to the same individual in T, called u' similarly, all members of {mi} from S 
are mapped to the same individual in T, called u. 

For each non-abstraction predicate of 2 -valued structure S, the corresponding 
predicate in 3 -valued structure T is formed by a “truth-blurring quotient”. The value 
for a tuple uq in is the join (U) of all p^ tuples that the equivalence relation on 
individuals maps to uq. For instance, 

• In S, n^{ui,ui) equals 0 ; therefore, the value of n^{u, u) is 0 . 

• In S, n^{u2, Ml), n^{us, ui), and n^{us, ui) all equal 0 ; therefore, the value of 

u) is 0. 

• In S, n^{ui,us) and m‘®(mi, M4) both equal 0 , whereas m‘®(mi, M2) equals 1 ; there- 
fore, the value of n^{u, u') is 1/2 (= 0 U 1). 

• In S, n^{u2,us) and n^{us,Ui) both equal 1 , whereas n^{u2,U2), n^{u2,U4), 

m‘®(m3,M2), {us,us), (u4,U2), {u4,us), and n^{u4,U4) all equal 0; 

therefore, the value of vF{u' , m') is 1/2 (= 0 U 1). 

In the upper-left-hand corner of Fig. 2 , the boxes in the tables for predicates n and eq 
indicate these four groupings of values. 

In a 2 -valued structure, the eq predicate represents the equality relation on indi- 
viduals. In general, under canonical abstraction some individuals “lose their identity” 
because of uncertainty that arises in the eq predicate. For instance, eq^{u, m) = 1 be- 
cause M in T represents a single individual of S. On the other hand, u' represents three 
individuals of S and the quotient operation causes eq'^{u',u') to have the value 1 / 2 . 
An individual like u' is called a summary individual. 



^ The names of individuals are completely arbitrary: what distinguishes u' is the value of its 
vector of abstraction predicates. 
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A 3-valued logical structure T is used as an abstract descriptor of a set of 2-valued 
logical structures. In general, a summary individual models a set of individuals in each 
of the 2-valued logical structures that T represents. The graphical notation for 3-valued 
logical structures (cf. structure T of Fig. 2) is derived from the one for 2-valued struc- 
tures, with the following additions: 

• Individuals are represented by circles containing their names. (In Figs. 3 and 4, 
we also place unary predicates that do not correspond to pointer-valued program 
variables inside the circles.) 

• A summary individual is represented by a double circle. 

• Unary and binary predicates with value 1 /2 are represented by dotted arrows. 
Thus, in every concrete structure S that is represented by abstract structure T of Fig. 2, 
pointer variables x and y definitely point to the concrete node of S that u represents. 
The n-field of that node may point to one of the concrete nodes that u' represents; u' is 
a summary individual, i.e., it may represent more than one concrete node in S. Possibly 
there is an n-field in one or more of these concrete nodes that points to another of the 
concrete nodes that u' represents, but there cannot be an n-field in any of these concrete 
nodes that points to the concrete node that u represents. 

Note that 3-valued structure T also represents 

• the acyclic lists of length 3 or more that are pointed to by x and y. 

• the cyclic lists of length 3 or more that are pointed to by x and y, such that the 
back-pointer is not to the head of the list, but to the second, third, or later element. 

• some additional memory configurations with a cyclic or acyclic list pointed to by 
X and y that also contain some garbage cells that are not reachable from x and y. 

That is, T is a hnite representation of an infinite set of (possibly cyclic) concrete lists, 
each of which may also be accompanied by some unreachable cells. Later in this sec- 
tion, we discuss options for fine-tuning an abstraction. In particular, we will use canon- 
ical abstraction to define an abstraction in which the acyclic lists and the cyclic lists 
are mapped to different 3-valued structures (and in which the presence or absence of 
unreachable cells is readily apparent). 

Canonical abstraction ensures that each 3-valued structure has an a priori bounded 
size, which guarantees that a fixed-point will always be reached by an iterative static- 
analysis algorithm. Another advantage of using 2- and 3-valued logic as the basis for 
static analysis is that the language used for extracting information from the concrete 
world and the abstract world is identical: every syntactic expression - i.e., every logical 
formula - can be interpreted either in the 2-valued world or the 3-valued world"*. 

The consistency of the 2-valued and 3-valued viewpoints is ensured by a basic theo- 
rem that relates the two logics. This explains Point (5) mentioned in Sect. 1 : the method 
eliminates the need for the user to write the usual proofs required with abstract interpre- 
tation. Thanks to a single meta-theorem (the Embedding Theorem [17, Theorem 4.9]), 



* Formulas are first-order formulas with transitive closure: a formula over the vocabulary V = 
{eq,pi, . . . ,Pn} is defined as follows (where p*{vi,V2) stands for the reflexive transitive 
closure of p (ui , U2 ) ) : 

p £V,(f> e Formulas, p ::= 0 | 1 | p(ui, . ..,Vk)\ (“'<Fi) I (v^i F2) \ {<fii V P2) 

V € Variables \ (3u: pi) | (Vu: pi) | p*(vi,V2) 
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which shows that information extracted from a 3-valued structure T by evaluating a 
formula Lp is sound with respect to the value of ip in each of the 2-valued structures 
that T represents, an abstract semantics falls out automatically from the specification 
of the concrete semantics. In particular, the formulas that define the concrete semantics 
when interpreted in 2-valued logic define a sound abstract semantics when interpreted 
in 3-valued logic (see Fig. 2). Soundness of all instantiations of the analysis framework 
is ensured by the Embedding Theorem. 

Program Analysis via 3-Valued Logic. A run of the analyzer carries out an abstract 
interpretation to collect a set of 3-valued structures at each program point. This involves 
finding the least fixed-point of a certain set of equations. Because canonical abstraction 
ensures that each 3-valued structure has an a priori bounded size, there are only a fi- 
nite number of sets of 3-valued structures. This guarantees that a fixed-point is always 
reached. The structures collected at program point P describe a superset of all the ex- 
ecution states that can occur at P. To determine whether a property always holds at P, 
one checks whether it holds in all of the structures that were collected there. 



unary preds. binary preds. 




t 

S V 



unary preds. binary preds. 



indiv.|x|y|t |e|rn.x|cn 


n\u\ v! 


eq\u\ u' 


u |1|1|0|0| 1 |0| 


^'WP\ 




u' |0|0|0|0| 1 |0| 




t 

u y 



Fig. 3. The abstraction of 2-valued stmcture S to 3-valued structure U when we use 
{x, y, t, e, r„,x, Cn} -abstraction. In contrast with T, T' , and T" from Fig. 2, U represents only 
acyclic lists of length 3 or more (with no garbage cells). 



Instrumentation Predicates. Unfortunately, unless some care is taken in the design of 
an analysis, there is a danger that as abstract interpretation proceeds, the indefinite value 
1/2 will become pervasive. This can destroy the ability to recover interesting informa- 
tion from the 3-valued structures collected (although soundness is maintained). A key 
role in combating indefiniteness is played by instrumentation predicates, which record 
auxiliary information in a logical structure. They provide a mechanism for the user to 
fine-tune an abstraction: an instrumentation predicate p of arity k, which is defined by 
a logical formula ... ,Vk) over the core predicate symbols, captures a property 

that each fc-tuple of nodes may or may not possess. In general, adding additional in- 
strumentation predicates refines the abstraction, defining a more precise analysis that 
is prepared to track finer distinctions among nodes. This allows more properties of the 
program’s stores to be identified during analysis. 

The introduction of unary instrumentation predicates that are then used as abstrac- 
tion predicates provides a way to control which concrete individuals are merged to- 
gether into summary nodes, and thereby to control the amount of information lost by 
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abstraction. Instrumentation predicates that involve reachability properties, which can 
be defined using transitive closure, often play a crucial role in the definitions of abstrac- 
tions (cf. Fig. 3). For instance, in program-analysis applications, reachability properties 
from specific pointer variables have the effect of keeping disjoint sublists or subtrees 
summarized separately. This is particularly important when analyzing a program in 
which two pointers are advanced along disjoint sublists. Tab. 2 lists some instrumenta- 
tion predicates that are important for the analysis of programs that use type List. 



Table 2. Defining formulas of some commonly used instrumentation predicates. Typically, there 
is a separate predicate symbol Tn,x for every pointer-valued variable x. 



p 


IntendedMeaning 




tn{vi,V 2 ) 

rn,x(v) 

c„(v) 


Is V2 reachable from vi along n-fields? 

Is V reachable from pointer variable x along n-fields? 
Is u on a directed cycle of n-fields? 


n*(vi,V2) 

3ui: x(vi) At„(vi,v) 
3ui : n{v,vi) A t„{vi,v) 



From the standpoint of the concrete semantics, instrumentation predicates represent 
cached information that could always be recomputed by reevaluating the instrumenta- 
tion predicate’s defining formula in the current store. From the standpoint of the abstract 
semantics, however, reevaluating a formula in the current (3-valued) store can lead to 
a drastic loss of precision. To gain maximum benefit from instrumentation predicates, 
an abstract-interpretation algorithm must obtain their values in some other way. This 
problem, the instrumentation-predicate-maintenance problem, is solved by incremen- 
tal computation; the new value that instrumentation predicate p should have after a 
transition via abstract state transformer r from state cr to a' is computed incrementally 
from the known value of p in a. An algorithm that uses r and p’s defining formula 
'tpp{vi, . . . ,Vk) to generate an appropriate incremental predicate-maintenance formula 
for p is presented in [22]. 

The problem of automatically identifying appropriate instrumentation predicates, 
using a process of abstraction refinement, is addressed in [23]. In that paper, the input 
required to specify a program analysis consists of (i) a program, (ii) a characterization 
of the inputs, and (iii) a query (i.e., a formula that characterizes the intended output). 
This work, along with [22], provides a framework for eliminating previously required 
user inputs for which TVLA has been criticized in the past. 



Other Operations on Logical Structures. Thanks to the fact that the Embedding The- 
orem applies to any pair of structures for which one can be embedded into the other, 
most operations on 3-valued structures need not be constrained to manipulate 3-valued 
structures that are images of canonical abstraction. Thus, it is not necessary to per- 
form canonical abstraction after the application of each abstract structure transformer. 
To ensure that abstract interpretation terminates, it is only necessary that canonical ab- 
straction be applied as a widening operator somewhere in each loop, e.g., at the target 
of each back-edge in the control-flow graph. 
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Unfortunately, the simple abstract semantics obtained by applying predicate-update 
formulas to 3-valued structures can be very imprecise. For instance, in Fig. 2, 
y = y- >n sets y to point to the next element in the list. In the abstract semantics, the 
evaluation in structure T of predicate-update formula y'{v) = 3ui : y{vi) A n(vi,v) 
causes (u') to be set to 1/2. Consequently, all we can surmise in T' is that y may 
point to one of the cells that summary node u' represents. In contrast, the canonical 
abstraction of S' is T", which demonstrates that the abstract domain is capable of rep- 
resenting a more precise abstract semantics than the T -to-T' transformation illustrated 
in Fig. 2. 

In [24], it is shown that for a Galois connection defined by abstraction function a 
and concretization function 7, the best abstract transformer for a concrete transformer 
T, denoted by can be expressed as: r** = a o r o 7 . This defines the limit of precision 
obtainable using a given abstract domain; however, it is a non-constructive definition: it 
does not provide an algorithm for finding or applying . 

To help prevent an analysis from losing precision, several other operations on logical 
structures are used to implement a better approximation to the best transformer [17]: 

• Focus is an operation that can be invoked to elaborate a 3-valued structure - al- 
lowing it to be replaced by a set of more precise structures (not necessarily images 
of canonical abstraction) that represent the same set of concrete stores. 

• Coerce is a clean-up operation that may “sharpen” a 3-valued structure by setting 
an indefinite value (1 / 2 ) to a definite value (0 or 1 ), or discard a structure entirely 
if the structure exhibits some fundamental inconsistency (e.g., it cannot represent 
any possible concrete store). 

The transformers used in TVLA make use of Focus, Coerce, and incremental predicate- 
maintenance formulas to implement a better approximation to the best transformer than 
the T-to-T' transformation of Fig. 2. In particular, TVLA is capable of “materializing” 
non-summary nodes from summary nodes. For instance, given T, TVLA’s transformer 
for y = y- >n would create structure T" (among others) - in essence, materializing U 2 
out of u' . Materialization permits the abstract execution of an assignment to a pointer- 
valued field of a newly created non-summary node to perform a strong update. 



Dynamically Allocated Storage. One way to model the semantics of x = malloc ( ) 
is to model the free-storage list explicitly [22], so as to exploit materialization: 




A malloc is modeled by advancing the pointer freelist into the list, and returning 
the memory cell that it formerly pointed to. A free is modeled by inserting, at the 
head of freelist’s list, the cell being deallocated. 



3 Applications and Extensions 

Interprocedural Analysis. The application of canonical abstraction to interprocedural 
analysis of programs with recursion has been studied in both [25] and [26]. In [25], the 
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main idea is to expose the runtime stack as an explicit “data structure” of the concrete 
semantics; that is, activation records are individuals, and suitable core predicates are 
introduced to capture how activation records are linked together to form a stack. Instru- 
mentation predicates are used to record information about the calling context and the 
“invisible” copies of variables in pending activation records on the stack. 

The analysis in [26] is based on logical structures over a doubled vocabulary V^V , 
where V' = {p' \ p & V} and l±l denotes disjoint union. This approach creates a 
finite abstraction that relates the predicate values for an individual at the beginning of 
a transition to the predicate values for the individual at the end of the transition. Such 
two-vocabulary 3-valued structures are used to create a summary transformer for each 
procedure P, and the summary transformer is used at each call site at which P is called. 



Checking Multithreaded Systems. In [27], it is shown how to apply 3-valued logic 
to the problem of checking properties of multithreaded systems. In particular, [27] ad- 
dresses the problem of state-space exploration for languages, such as Java, that allow 
dynamic creation and destruction of an unbounded number of threads (as well as dy- 
namic storage allocation and destructive updating of structure fields). Threads are mod- 
eled by individuals, which are abstracted using canonical abstraction - in this case, the 
collection of unary thread properties that hold for a given thread. The use of this naming 
scheme automatically discovers commonalities in the state space, but without relying 
on explicitly supplied symmetry properties, as in, for example, [28]. The analysis algo- 
rithm given in [27] builds and explores a 3-valued transition system on-the-fly. Unary 
core predicates are used to represent the program counter of each thread object; Focus 
is used to implement nondeterministic selection of a runable thread. 



Numeric Abstractions. The abstractions described in Sect. 2 are capable of represent- 
ing pointer variables, their contents, and the structure of the heap, but have no direct 
way of representing the actual data items that are stored in the nodes of data struc- 
tures. Recent work [21] has coupled canonical abstraction with a variety of previously 
known numeric abstractions; intervals, congruences, polyhedra [29], and various re- 
strictions on polyhedral domains (such as difference constraints [30, 31] and 2-variable 
constraints [32]). These overapproximate the states that can arise in a program using 
sets of points in a fc-dimensional space. However, when canonical abstraction is used 
to create bounded-size representations of memory configurations, the number of nodes 
in an abstract descriptor is different at different points in the program; for numeric ab- 
stractions, this means that the number of axes changes from program point to program 
point - i.e., there is not a fixed value of k. To capture numeric properties in such a sum- 
marizing framework, an analysis needs to be able to capture the relationships among 
values of groups of numeric objects, rather than relationships among values of individ- 
ual numeric objects [ 21 ]. 



Best Abstract Transformers. As mentioned in Sect. 2, for a Galois connection (a, 7 ), 
a non-constructive dehnition of the best abstract transformer for concrete transformer 
T can be expressed as r** = aoroj. This dehnes the limit of precision obtainable 
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using a given abstract domain, but does not provide an algorithm for finding or applying 
tK Graf and Saidi [33] showed that decision procedures can be used to generate best 
abstract transformers for abstract domains that are finite Cartesian products of Boolean 
values. (The use of such domains is known as predicate abstraction.) 

The ability to perform abstract interpretation using best abstract transformers could 
play a key role in combating indefiniteness in 3-valued structures. It would ensure that 
an abstract interpretation computes answers that are precise up to the inherent limita- 
tions of the abstraction in use. In recent work, we have made a start towards this goal. 
In particular, we have defined two approaches to computing best transformers for ap- 
plications that use canonical abstraction [34, 35]. 

Applications. Some of the problems to which the 3-valued-logic approach has been 
applied include the following: In [36], TVLA was used to establish the partial cor- 
rectness of bubble-sort and insert-sort routines for sorting linked lists. The abstraction- 
refinement method of [23] was used to extend this work to address stability and anti- 
stability properties of sorting routines. TVLA has also been used to demonstrate the 
total correctness of a mark-and-sweep garbage collector operating on an arbitrary heap. 
In Java, once an iterator object Oi is created for a collection Oc, Oi may be used only 
as long as Oc remains unmodified, not counting modifications made via op, otherwise 
a “concurrent modification exception” is thrown. In [37], TVLA was used to create a 
verification tool for establishing the absence of concurrent modification exceptions. 

In the area of multithreaded systems, the 3-valued-logic approach has been used to 
establish the absence of deadlock for a dining-philosophers program that permits there 
to be an unbounded number of philosophers [27], as well as to establish the partial 
correctness of two concurrent queue algorithms; these results were obtained without 
imposing any a priori bound on the number of allocated objects and threads [38]. 

4 Related Work 

Predicate Abstraction. Canonical abstraction is sometimes confused with predicate 
abstraction, which has been used in a variety of systems [33,39,6,40]. At one level, 
predicate abstraction and canonical abstraction use essentially the same mechanism: 

• Predicate abstraction can be used to abstract a possibly-infinite transition system 
to a finite one: concrete states of the transition system are grouped into abstract 
states according to the values of a vector of properties. The transition relation is 
quotiented by the equivalence relation induced on concrete states. 

• Canonical abstraction is used to abstract a possibly-infinite logical structure to a 
finite 3-valued one: concrete individuals are mapped to abstract individuals ac- 
cording to the values of a vector of unary abstraction predicates; all other predi- 
cates are quotiented by the equivalence relation induced on concrete individuals. 

However, as used in [17], canonical abstraction is applied to encodings of stores as 
logical structures, and machinery is developed to use 3-valued structures to define a 
parametric abstract domain for abstract interpretation. Predicate abstraction has also 
been used to define a parametric abstract domain [41]. Thus, an alternative comparison 
criterion is to consider the relationship between the two parametric abstract domains: 
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• Predicate abstraction yields a parametric abstract domain based on finite Cartesian 
products of Booleans (i.e., nullary predicates). An abstract value consists of a 
finite set of finite-sized vectors of nullary predicates [41], 

• Canonical abstraction yields a parametric abstract domain based on 3-valued log- 
ical structures. An abstract value consists of a finite set of finite-sized 3-valued 
structures [17]. 

A special case of canonical abstraction occurs when no abstraction predicates are used 
at all, in which case all individuals are collapsed to a single individual. When this 
is done, in almost all structures the only useful information remaining resides in the 
nullary core and instrumentation predicates. Predicate abstraction can be seen as go- 
ing one step further, and retaining only the nullary predicates. From this point of view, 
canonical abstraction is strictly more general than predicate abstraction. 




Fig. 4. (a) Transition diagram for a stoplight; (b) transition diagram abstracted via the method of 
[42] when green and yellow are mapped to go; (c) transition diagram abstracted via canonical 
abstraction, using red(v) as the only abstraction predicate. 



Existential Abstraction. Canonical abstraction is also related to the notion of existen- 
tial abstraction used in [43,42]. However, canonical abstraction yields 3-valued predi- 
cates and distinguishes summary nodes from non-summary nodes, whereas existential 
abstraction yields 2-valued predicates and does not distinguish summary nodes from 
non-summary nodes. Fig. 4 shows the transition diagram for a stoplight - an example 
used in [42] - abstracted via the method of [42] (Fig. 4(b)) and via canonical abstrac- 
tion, using red{v) as the only abstraction predicate (Fig. 4(c)). With existential abstrac- 
tion, soundness is preserved by restricting attention to universal formulas (formulas in 
ACTL*). With canonical abstraction, soundness is also preserved by switching logics, 
although in this case there is no syntactic restriction; we switch from 2-valued first- 
order logic to 3-valued first-order logic. An advantage of this approach is that if ip is 
any formula for a query about a concrete state, the same syntactic formula <p can be 
used to pose the same query about an abstract state. 

One-Sided versus Two-Sided Answers. Most static-analysis algorithms provide 2- 
valued answers, but are one-sided: an answer is definite on one value and conservative 
on the other. That is, either 0 means 0, and 1 means “maybe”; or 1 means 1, and 0 
means “maybe”. In contrast, by basing the abstract semantics on 3-valued logic, definite 
truth and definite falseness can both be tracked, with 1/2 capturing indehniteness. (To 
determine whether a formula ip holds at P, it is evaluated in each of the structures that 
are collected at P. The answer is the join of these values.) This provides insight into 
the true nature of the one-sided approach. For instance, an analysis that is definite with 
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respect to 1 is really a 3-valued analysis that conflates 0 and 1/2 (and uses 0 in place of 

1 /2). (It should be noted that with a two-sided analysis, the answers 0 and 1 are definite 

with respect to the concrete semantics as specified, which may itself overapproximate 

the behavior of the actual system being modeled.) 
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Abstract. We outline a theorem- proving approach to verify pipelined 
machines. Pipelined machines are complicated to reason about since they 
involve simultaneous overlapped execution of different instructions. Nev- 
ertheless, we show that if the logic used is sufficiently expressive, then 
it is possible to relate the executions of the pipelined machine with the 
corresponding Instruction Set Architecture using (stuttering) simulation. 
Our methodology uses first-order quantification to define a predicate that 
relates pipeline states with ISA states and uses its Skolem witness for 
correspondence proofs. Our methodology can be used to reason about 
generic pipelines with interrupts, stalls, and exceptions, and we demon- 
strate its use in verifying pipelines mechanically in the ACL2 theorem 
prover. 



1 Introduction 

This paper is concerned with formal verification of pipelined machines. Pipelin- 
ing is a key feature in the design of today’s microprocessors. It improves perfor- 
mance by temporally overlapping execution of different instructions; that is, by 
initiating execution of a subsequent instruction before a preceding instruction 
has been completed. To formally verify modern microprocessors, it is imperative 
to be able to effectively reason about modern pipelines. 

Formal verification of microprocessors normally involves showing some cor- 
respondence between a microarchitectural implementation (MA) and its cor- 
responding Instruction Set Architecture (ISA). The ISA is typically a non- 
pipelined machine which executes each instruction atomically. For non-pipelined 
microprocessors, one typically shows simulation correspondence (Fig. 1): If an 
implementation state is externally equal to a specification state, then execut- 
ing one instruction in both the microarchitecture and the ISA results in states 
that are externally equal [1,2]. This implies that for every (infinite) execution 
of MA, there is a corresponding (infinite) execution of the ISA with the same 
visible behavior. However, for pipelined machines, such correlations are difficult 
to establish because of latency in the pipelines. As a result, a large number 
of correspondence notions have been used to reason about pipelined machines. 

* Support for this work was provided in part by the SRC under contract 02-TJ-1032. 
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Fig. 1. Commutative Diagram for Simulation Correspondence 



As features like interrupts, and out-of-order instruction execution have been 
modeled and reasoned about, notions of correctness have had to be modified 
and extended to account for these features. Consequently, the correspondence 
theorems have become complicated, difficult to understand, and even contro- 
versial [3]. Further, the lack of uniformity has made composition of proofs of 
different components of a modern processor cumbersome and difficult. 

In this paper, we argue that in a logic that allows arbitrary first-order quan- 
tification and Skolemization, simulation-based correspondence is still effective 
and sufficient for reasoning about modern pipelines. Our notion of correctness is 
stuttering simulations [2], and preserves both safety and liveness properties. The 
chief contribution of this work is to show how to effectively use quantification to 
define a correspondence relating the states of a pipelined machine with those of 
its ISA. Our work makes use of the Burch and Dill pipeline flushing diagram [4] 
(Fig. 2), to derive the correspondence without complicated invariant definitions 
or characterization of the precise timing between the execution of different in- 
structions in the pipeline. Indeed, our techniques are generic and we are able 
to apply the same notion of correspondence to pipelines with stalls, interrupts, 
exceptions, and out-of-order execution, and verify such machines mechanically 
with reasonable automation. 

In this section, we first survey related approaches and notions of correctness 
used in verification of pipelined machines. We then describe our approach and 
proof methodology in greater detail. 



1.1 Related Work 

Reasoning about pipelines has been an active area of research. Aagaard et al. [3] 
provides an excellent survey and comparison of the different techniques. Some 
of the early studies have used skewed abstraetion functions [5, 6] to map the 
states of the pipeline at different moments to a single ISA state. The skewed 
abstraction functions need to precisely characterize all timing delays and hence 
their definitions are both complex and vulnerable to design modifications. By 
far the most popular notion of correctness used has been the Burch and Dill 
pipeline flushing diagram [4] (Fig. 2). The approach is to use the implementa- 
tion itself to flush the pipeline by not permitting new instructions to be fetched, 
and projecting the programmer-visible components of the flushed state to de- 
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Fig. 2. Burch and Dill Pipeline Flushing Diagram 

fine the corresponding ISA state. Sawada and Hunt [7, 8] use a variant called 
flush point alignment to verify a complicated pipelined microprocessor with ex- 
ceptions, stalls, and interrupts using the ACL2 theorem prover. They use an 
intermediate data structure called MAETT to keep track of the history of ex- 
ecution of the different instructions through the pipeline. Hosabettu et al. [9] 
use another variant, flush point refinement, and they verify a deterministic out- 
of-order machine using completion functions. Both of these approaches require 
construction of complicated invariants to demonstrate correlation between the 
pipelined machine and the ISA. In addition, there have also been compositional 
model checking approaches to reason about pipelines. For example, Jhala and 
McMillan [10] use symmetry, temporal case-splitting, and data-type reduction 
to verify out-of-order pipelines. While more automatic than theorem proving, 
applicability of the method in practice requires the user to explicitly decompose 
the proof into manageable pieces to alleviate state explosion. Further, it relies 
on symmetry assumptions which are often violated by the heterogeneity of mod- 
ern pipelines. Finally, there has been work on using a combination of decision 
procedures and theorem proving to verify modern pipelines [11, 12], whereby de- 
cision procedures have been used to assist the theorem prover in verifying state 
invariants. 

Manolios [13] shows logical problems with the Burch and Dill notion and pro- 
vides a general notion of correctness using well-founded equivalence bisimulation 
(WEB) [14]. He uses it to reason about several variants of Sawada’s pipelines [7]. 
This, to our knowledge, is the first attempt in reasoning about pipelines using 
a general-purpose correctness notion expressing both safety and liveness prop- 
erties. Our notion of correctness and proof rules are a direct consequence of the 
work with WEBs. The basic difference between our approach and that of Mano- 
lios is in the methodology used for relating pipeline states with ISA states. Since 
this methodology is a key contribution of this paper, we compare our approach 
with his in greater detail after outlining our techniques in Section 3. 

1.2 Overview of Our Approach 

We now describe how it is possible to relate pipeline states with the correspond- 
ing ISA using simulation-based correspondence in spite of overlapped execution 
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of different instructions. Note that our quest here is for a generic approach rather 
than the verification of one specific design. To convey the basic idea we present a 
simplified view of pipelined machines, namely in-order execution and completion 
of at most one instruction in a clock cycle. A precise mathematical description is 
presented later in Section 3, and features like interrupts, out-of-order execution 
etc., are dealt with in Section 4. 

Consider the pipelined machine in some state MA, where a specific instruc- 
tion i\ is poised to complete. Presumably, then, before reaching MA, the ma- 
chine must have encountered some state MA\ in which i\ was poised to enter 
the pipeline. Call MA\ the witnessing state of MA. Assuming that instructions 
are completed in order, all and only the incomplete instructions at state MA\ 
(meaning, all instructions before i\) must complete before the pipeline can reach 
state MA starting from MA\. Now consider the state MAip obtained by flushing 
MA\. The flushing operation also completes all incomplete instructions without 
fetching any new instructions; that is, MA\p has all instructions before i\ com- 
pleted, and is poised to fetch ii. If only completed instructions affect the visible 
behavior of the machine, then this suggests that MAip and MA must be exter- 
nally equab . 

Based on the above intuition, we define a relation sim to correlate pipeline 
states with ISA states: MA is related to ISA if and only if there exists a witnessing 
state MAi such that ISA is the projection of the visible components of MAip. 
Recall that projection preserves the visible components of a state. From the 
arguments above, whenever states MA and ISA are related by sim they are 
externally equal. Thus to establish simulation correspondence (Fig. 1), we need 
to show that if MA is related to ISA, and MA' and ISA' are states obtained by 
1-step execution from MA and ISA respectively, then MA' and ISA' are related 
by sim. Our approach is shown in Fig. 3. Roughly, we show that if MA and ISA 




Fig. 3. Use of Burch and Dill and Symbolic Execution to obtain Simulation-based 
refinement proofs 



^ Notice that MA and MAip would possibly have different values of the program 
counter. This is normally addressed [13] by excluding the program counter from the 
observations (or labels) of a state. 
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are related, and MAi is the corresponding witnessing state for MA, then one 
can construct a corresponding witnessing state for MA' by running the pipeline 
for a sufficient number of steps from MA\. In particular, ignoring the possibility 
of stalls (or bubbles) in the pipeline, the state following MAi, which is Myl), is 
a witnessing state of MA' , by the following argument. Execution of the pipeline 
for one cycle from MA completes ii and the next instruction, i 2 , after ii in 
program order is poised to complete in state MA' . The witnessing state for MA' 
thus should be poised to initiate in the pipeline. And indeed, execution for 
one cycle from MA\ leaves the machine in a state in which i 2 is poised to enter 
the pipeline. (We make this argument more precise in the context of an example 
pipeline using Lemma 2 in Section 3.) Finally, the correspondence between MA) 
and ISA' follows by noting that the MA states MA\, MA'i, and ISA states ISA 
and ISA' satisfy the Burch and Dill pipeline flushing diagram. 

The reader should note that the above proof approach depends on our de- 
termining the witnessing state MA\ given MA. However, since MA\ is a state 
that occurs in the “past” of MA, MA might not retain sufficient information for 
computing MA\. In general, to compute witnessing states, one needs to define an 
intermediate machine to keep track of the history of execution of the different 
instructions in the pipeline. However, an observation of this paper is that given 
a sufficiently expressive logic, the witnessing state need not be constructively 
computed. Rather, we can simply define a predicate specifying “some witnessing 
state exists” . Skolemization of the predicate then produces a witnessing state. 

In the presence of stalls, a state MA might not have any instruction poised to 
complete. Even for pipelines without stalls, no instruction completes for several 
cycles after the initiation of the machine. Correspondence in the presence of such 
bubbles is achieved by allowing finite stutter in our verification framework. In 
other words, if MA is related to ISA, and MA has no instruction to complete, 
then MA' is related to ISA instead of ISA'. Our proof rules guarantee such 
stuttering is finite, allowing us to preserve both safety and liveness properties. 
Note that correspondence frameworks allowing a finite stutter are known to be 
effective in relating two system models at different levels of abstraction using 
simulation and bisimulation. In ACL2, stuttering has been used with simulation 
and bisimulation to verify concurrent protocols [15,14], and indeed, pipelined 
machines [13]. 

The remainder of the paper is organized as follows. We discuss our notion 
of correspondence and the associated proof rules in Section 2. In Section 3, we 
discuss the correspondence proof for an example pipelined machine. In Section 4, 
we show how our method can be used to reason about pipelines with interrupts, 
out-of-order execution etc. Finally, we conclude in Section 5. All the theorems 
described here have been proven with the ACL2 theorem prover [16, 17]. ACL2 is 
an essentially first-order logic of total recursive functions, with induction up to cq 
and support for first-order quantification via Skolemization. However, this paper 
does not assume any prior exposure to ACL2; we use traditional mathematical 
notations for our presentation. 
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2 Refinements 

Our proof rules relate infinite executions of two computing systems using single- 
step theorems. The rules are adapted from the proof rules for stuttering simula- 
tion [2], and a direct consequence of work with WEBs [14]. 

Computing systems are typically modeled in ACL2 as state machines by 
three functions namely init, step, and label, with the following semantics: 

~ The constant function initQ returns the initial state of the system. 

— Given a state s and an external input i, the function step{s, i) returns the 
state of the system after one clock cycle. 

— For a state s, the function label{s) returns valuations of the observable com- 
ponents of s 

Such models are common in ACL2 and have been found useful in verification 
of several hardware and software systems [18, 1]. Also, since ACL2 is a logic of 
total functions, the functions step and label above are total. 

We now describe the proof rules. Given models impl = {initj , step j , labelj), 
and spec = {inits, stepg, labels), the idea is to define binary predicates sim and 
commit with the following properties: 

1 . sim{initi{) , inits{)) , 

2. ypi,ps ■ sim{pi,ps) ^ labeli{pi) = labels{ps), 

3- Vp/,ps,z,3j : sim{pi,ps) A commit{pi,i) ^ sim{stepj(j>i,i), steps(psj)), 

4. ypi,ps,i : sim{pi,ps) f\^commit{pi,i) sim{stepj{pi,i),ps)- 

Informally, sim relates the states of impl with the states of spec, and commit 
determines whether the “current step” is a stuttering step for spec. The above 
four conditions guarantee that for every (infinite) execution of impl, there exists 
a corresponding (infinite) execution of spec that has the same observations up 
to stuttering. To guarantee that stuttering is finite, we define a unary function 
rank with the following properties: 

5. Vp : rank{p) G W , where IT is a set known to be well-founded according 
some ordering relation 

6. \/pi,ps,i : sim{ps,pi) /\^commit{pi,i) rank{stepj{pi,i)) -< rank{pi). 

We say that impl is a (stuttering) refinement of spec, denoted {impl >> spec), if 
there exist functions sim, commit, and rank that satisfy the six conditions above. 
The proof rules essentially guarantee that impl is a simulation of spec up to finite 
stuttering. The rules are analogous to proof rules for stuttering simulation [2] 
with the difference that our rules allow only one-sided stutter. We have not 
yet found a problem in which two-sided stutter has been necessary. Notice also 
that the notion of refinements is transitive, and this allows us to hierarchically 
compose proofs of different components of large practical systems. 

Our description above shows how to relate two non-deterministic system 
models. A computing system M = {init, step, label) is deterministic if step is a 

^ The label is analogous to the valuation of atomic propositions in a Kripke Structure. 
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function of only the first argument, namely the current state. If impl is deter- 
ministic, then commit is a function of the current state alone. In addition, if spec 
is deterministic, then the condition 3 is modified to eliminate the choice of input 
for the next step of spec: 

3. ypi,ps, : sim{pi,ps) A commit{pi) ^ sim{stepi{pi), stepsips))- 

3 Pipeline Verification 

We now show how to reason about pipelines using the correctness notion de- 
scribed in Section 2. For clarity of presentation, we consider a deterministic 
5-stage example pipeline with in-order execution. Features like arbitrary stalls, 
interrupts, and out-of-order execution are dealt with in Section 4. 

Our pipeline (Fig. 4) has fetch, decode, operand-fetch, execute, and write- 
back stages. The machine has a decoder, an ALU, a register file, and 4 latches 



Memory PC Register file 




Fig. 4. A simple 5-stage pipeline 



to store intermediate computations. Instructions are fetched from the memory 
location pointed to by the program counter (PC) and loaded into the first latch. 
The instructions then proceed in sequence through the different pipe stages in 
program order until they retire. Every latch has a valid bit to indicate if the latch 
is non-empty. We use a 3-address instruction format consisting of an opcode, two 
source registers, and a target register. While the machine allows only in-order 
execution, data forwarding is implemented from latch 4 to latch 3. Nevertheless, 
the pipeline can still have 1-cycle stall: If the instruction Z 2 at latch 2 has as one 
of its source registers the target of the instruction i^ at latch 3, then Z 2 cannot 
proceed to latch 3 in the next state when completes its ALU operation. 

The executions of the pipeline above can be defined as a deterministic model 
MA = {initi, stepj, labelj). We assume that initj has some program loaded in the 
memory, and some initial configuration of the register file, but an empty pipeline. 
Since we are interested in the updates of the register file, we let labelj preserve 
the register file of a state. Finally, the ISA machine ISA = {inits, stepg, labels) is 
merely one that executes the instruction pointed to by PC atomically; that is, it 
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fetches the correct operands, applies the corresponding ALU operation, updates 
the register file, and increments PC in one atomic step. 

Notice that we have left the actual instructions unspecified. Our proof ap- 
proach does not require complete specification of the instructions but merely the 
constraint that the ISA performs analogous atomic update for each instruction. 
In ACL2, we use constrained functions for specifying the updates applied to 
instructions at every stage of the pipeline. 

Our goal now is to show that (MA >> ISA). We define commit so that 
commit{MA) is true if and only if latch 4 is valid in MA. Notice that whenever 
an MA state MA satisfies commit, some instruction is completed at the next 
step. It will be convenient to define functions characterizing partial updates 
of an instruction in the pipeline. Consequently, we define four functions runi, 
ruu 2 , runs, and runs, where rurii{MA,inst) “runs” instruction inst for the first 
i stages of the pipe and updates the i-th latch. For example, run 2 {MA,inst) 
updates latch 2 with the decoded value of the instruction. In addition, we define 
two functions, flush and stalled. Given a pipeline state MA, flush(MA) returns 
the flushed pipeline state MAp, by executing the machine for sufficient number 
of cycles without fetching any new instruction. The predicate stalled is true of a 
state MA if and only if both latches 2 and 3 are valid in MA, and the destination 
for the instruction at latch 3 is one of the sources of the instruction at latch 
2. Notice that executing a pipeline from a stalled state does not allow a new 
instruction to enter the pipeline. Finally the function proj(MA) projects the PC, 
memory, and register file of MA to the ISA. 

Lemma I is the Burch and Dill correctness for pipelines with stalls and was 
proved automatically by ACL2 using symbolic simulation. 

Lemma 1. For each pipeline state MA, 

nroi(Hush(sten (MAA) = ! steps{proj{flush{MA))) tf ^stalled(MA) 

P J[fl [ Pi\ ))) yproj{flush{MA)) otherwise 



We now define witnessing states. Given states MA\ and MA, MA\ is a witnessing 
state of MA, recognized by the predicate witnessing{MAi , MA), if (1) MA\ is not 
stalled, and (2) MA can be derived from MA\ by the following procedure: 

1. Flush MAi to get the state MAip. 

2. Apply the following update to MAip for f = 4, 3, 2, 1: 

— If latch i of MA is valid then apply rurii with the instruction pointed to 
by PC in MAip, correspondingly update latch i of MAxp, and advance 
PC; otherwise do nothing. 

We now define predicate sim as follows: 

— sim{MA, ISA) = {3MAi : {witnessing{MAi, MA)) 

A{ISA = proj{flush{MAi)))) 

Now we show how the predicate sim can be used to show the conditions 1-6 in 
Section 2. First, the function rank satisfying conditions 5 and 6 can be defined by 
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simply noticing that for any state MA, some instruction always advances. Hence 
if i is the maximum number in MA such that latch i is valid, then the quantity 
(5 — i) always returns a natural number (and hence a member of a well-founded 
set) that decreases at every step. (We take i to be 0 if MA has no valid latch.) 
Also, witnessing{initii) , initii)) holds, implying condition 1. Further, condition 2 
is trivial from definition of witnessing. We therefore focus here on only conditions 
3 and 4. We first consider the following lemma: 

Lemma 2. For all MA, MAi such that witnessing{MAi, MA): 

1. witnessing{MAi, stepj{MA)) if ^commit{M A) 

2. witnessing{stepj{stepj{MAi)), stepj{MA)) if commit{MA) 

/\stalled{step j{MAi)) 

3. witnessing{stepj{MAi),stepj{MA)) if commit{MA) /\ ^stalled{stepj{MAi)) 

The lemma merely states that if MA is not a commit state, then stepping 
from MA preserves the witnessing state, and otherwise the witnessing state for 
stepj(MA) is given by the “next non-stalled state” after MAi. The lemma can 
be proved by symbolic execution of the pipelined machine. We can now prove 
the main technical lemma that guarantees conditions 3 and 4 of Section 2. 

Lemma 3. For all MA and ISA, such that sim{MA, ISA): 

1. sim{stepj{MA),ISA) if ^commit{MA). 

2. sim{stepj{MA), stepg{ISA)) if commit{MA). 

Proof. Let MAi be the Skolem witness of sim{MA, ISA). Case 1 follows from 
Lemma 2, since witncssing{MAi, stepj{MA)) holds. For case 2, we consider only 
the situation ^stalled{stepj{MAi)) since the other situation is analogous. But 
by Lemma 1 and definition of witnessing, proj{flush{stepj{MAi))) = stepg(ISA). 
The result now follows from Lemma 2. □ 

We end this section with a brief comparison between our approach and that 
of Manolios [13]. We focus on a comparison with this work since unlike other 
related work, this approach uses a uniform notion of correctness that is appli- 
cable to both pipelined and non-pipelined microarchitectures. Indeed our use of 
stuttering simulation is a direct consequence of this work. The basic difference is 
in the techniques used to define the correspondence relation to relate MA states 
with ISA states. While we define a quantified predicate to posit the existence 
of a witnessing state, Manolios defines a refinement map from the states of MA 
to states of ISA as follows: Point the PC to the next instruction to complete 
and invalidate all the instructions in the pipe^. Notice immediately that the 
method requires that we have to keep track of the PC of each intermediate in- 
struction. Also, as the different pipeline stages update the instruction, one needs 

® Manolios also describes a “flushing proof” and shows that flushing can relate incon- 
sistent MA states to ISA states. But our application of flush is different from his 
in that we flush MAi rather than the current state MA. Our approach has no such 
problems. 
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some invariant specifying the transformations on each instruction at each stage. 
Short of computing such invariants based on the structure of the pipeline and 
the functionality of the different instructions, we believe that any generic ap- 
proach to determine invariants will reduce his approach to ours, namely defining 
a quantified predicate to posit the existence of a witnessing state. 

4 Generalization 

Stuttering simulation can be used to verify pipelines with advanced features. In 
this section, we consider arbitrary but finite stalls, interrupts, and out-of-order 
instruction execution. Stuttering simulation cannot be used directly for machines 
with out-of-order completion and multiple instruction completion. In Section 4.4 
we discuss an approach to circumvent such limitations. 



4.1 Stalls 

The pipeline in Fig. 4 allowed single-cycle stalls. Pipelines in practice can have 
stalls ranging over multiple cycles. If the stall is finite, it is easy to use stut- 
tering simulation to reason about such pipelines. Stalls affect Lemma 2 since 
given a witnessing state MAi of MA, the witnessing state for stepj(MA) is 
given by the “next non-stalled state” after MAi. But such a state is given by 
clock{stepj{MAi)), where the function clock (defined below) merely counts the 
number of steps to reach the first non-stalled state. Finiteness of stalls guarantees 
that the recursion terminates. 

clorklA^l'^ if -stolle(i(s) 

(1-1- clock{stepj{s)) otherwise 



4.2 Interrupts 

Modern pipelines allow interrupts and exceptions. To effectively reason about 
interrupts, we model both MA and ISA as non-deterministic machines, where 
the “input parameter” is used to decide whether the machine is to be interrupted 
at the current step or proceeds with normal execution. Recall that our notion of 
correspondence can relate non-deterministic machines. Servicing the interrupt 
might involve an update of the visible components of the state. In the ISA, we 
assume that the interrupt is serviced in one atomic step, while it might take 
several cycles in MA. 

We specify the witnessing states for an interruptible MA state as follows: 
MAi is a witnessing state of MA if either (i) MA is not within any interrupt and 
MAi initiates the instruction next to be completed in MA, or (ii) MA is within 
some interrupt and MA\ initates the corresponding interrupt. Then commit is 
true if either the current step returns from some interrupt service or completes 
an instruction. Assuming that pipeline executions are not interleaved with the 
interrupt processing, we can then show (MA >> ISA) for such non-deterministic 
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machines. We should note here that we have not analyzed machines with nested 
interrupts yet. But we believe that the methodology can be extended for nested 
interrupts by the witnessing state specifying the initiation of the most recent 
interrupt in the nest. 



4.3 Out-of-Order Execution 

Our methodology can handle pipelines with out-of-order instruction execution 
as long as the instructions are initiated to the pipeline and completed in program 
order. For a pipeline state MA, we determine the instruction i\ that is next to 
be completed, by merely simulating the machine forward from MA. We then 
specify MA\ to be a witnessing state of MA if ii is initiated into the pipeline 
at MAi. Notice that since instructions are initiated and completed in-order, 
any instruction before in program order must have been already initiated in 
the pipeline in state MA\. Since flushing merely involves executing the pipeline 
without initiating any instruction, flushing from state MA\ will therefore produce 
the state MAip with the same visible behavior as state MA. 

4.4 Out-of-Order and Multiple Instruction Completion 

Modern pipelines allow completion of multiple instructions at the same clock cy- 
cle, and out-of-order completion of instructions. Such features cannot be directly 
handled by our approach. In this section, we outline the problem and discuss 
a possble approach. We admit, however, that we have not attempted to verify 
pipelines with these features and our comments are merely speculative. 

Consider a pipeline state MA poised to complete two instructions i\ and 
at the same cycle. Assume that i\ updates register r\ and 12 updates register T2. 
Thus the visible behavior of the pipeline will show simultaneous updates of the 
two registers. The ISA, however, can only update one register at any clock cycle. 
Thus, there can be no ISA state ISA with the properties that (i) MA and ISA 
are externally equal, and (ii) executing both machines from MA and ISA results 
in states that are externally equal, even with possible stutter. In other words, 
there can be no simulation relation relating MA and ISA. The arguments are 
applicable to out-of-order completion as well, where the updates corresponding 
to the two instructions are “swapped” in the pipelined machine. 

Since multiple and out-of-order completion affect the visible behavior of the 
pipelined machine, we need to modify the execution of the ISA to show corre- 
spondence. We propose the following approach: The ISA, instead of executing 
single instructions atomically, non-deterministically selects a burst of instruc- 
tions. Each burst consists of a set of instruction sequences with instructions in 
different sequences not having any data dependency. The ISA then executes 
each sequence in a burst atomically, choosing the order of the sequences non- 
deterministically, and then selects the next burst after completing all sequences. 
Notice that our “original” ISA can be derived from such an ISA by letting the 
bursts be singleton sets of one instruction. 
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We are pursuing the “modified ISA” approach to verify practical pipelines. 
However, we have not verified any complicated machine using this approach, and 
investigation of the technique is a future area of research. 

5 Summary and Conclusions 

We have shown how to apply a uniform and well-studied notion of correspon- 
dence, namely stuttering simulation, to verify pipelined machines. Since the no- 
tion is compositional, proofs of pipelines can be directly composed with proofs of 
other components of a microprocessor. Further, the use of witnessing states en- 
ables us to merely “flush” the pipelined machine to define the simulation relation 
without constructing complicated structural invariants. 

Our work demonstrates the importance of first-order quantification in verifi- 
cation of practical computing systems. We have seen how quantification enables 
us to specify the witnessing state non-constructively by using the Skolem wit- 
ness of a quantified predicate. Since the witnessing state is a state that has been 
encountered in the “past” , constructing the state would require keeping track of 
the history of execution via an intermediate machine, and definition of a compli- 
cated invariant on that machine. The importance of first-order quantification is 
often overlooked in the use of a general-purpose theorem prover, especially by the 
ACL2 community. While ACL2 allows arbitrary first-order quantification, ACL2 
proofs typically use constructive recursive functions. In this context, it should 
be noted that a key advantage of the use of theorem-proving over algorithmic 
decision procedures like model-checking is the expressivity of the logic. Beyond 
this, theorem-proving normally involves more manual effort. Thus to successfully 
use theorem-proving with reasonable automation, it is important to make effec- 
tive use of expressivity. We have found quantification to be particularly useful in 
many circumstances for “backward simulation” , when the property preserved in 
a state is guaranteed by a state encountered by the machine in the past. Aside 
from defining witnessing states for pipelined machines, backward simulation is 
useful, for example, in specifying weakest preconditions for operationally mod- 
eled sequential programs. We are working on identifying other situations in which 
quantification can significantly reduce manual effort in deductive verification of 
modern computer systems. 

We have demonstrated the application of our approach to verify pipelines 
with stalls, interrupts, and out-of-order execution using stuttering simulation. 
As mentioned in Section 4.4, we cannot yet handle out-of-order completion and 
multiple instruction completion. We plan to analyze the extention outlined in 
Section 4.4 to reason about more complicated, practical pipelines. 
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Abstract. On shared memory multiprocessors, synchronization often 
turns out to be a performance bottleneck and the source of poor fault- 
tolerance. Lock-free algorithms can do without locking mechanisms, and 
are therefore desirable. Lock-free algorithms are hard to design correctly, 
however, even when apparently straightforward. We formalize Herlihy’s 
methodology [13] for transferring a sequential implementation of any 
data structure into a lock-free synchronization by means of synchroniza- 
tion primitives Load-linked [LL) / store- conditional {SC). This is done by 
means of a reduction theorem that enables us to reason about the general 
lock-free algorithm to be designed on a higher level than the synchroniza- 
tion primitives. The reduction theorem is based on refinement mapping 
as described by Lamport [10] and has been verified with the higher-order 
interactive theorem prover PVS. Using the reduction theorem, fewer in- 
variants are required and some invariants are easier to discover and easier 
to formulate. 

The lock-free implementation works quite well for small objects. How- 
ever, for large objects, the approach is not very attractive as the burden 
of copying the data can be very heavy. We propose two enhanced lock- 
free algorithms for large objects in which slower processes don’t need to 
copy the entire object again if their attempts fail. This results in lower 
copying overhead than in Herlihy’s proposal. 

Keywords &: Phrases: Distributed algorithms. Lock-free, Simulation, 
Refinement mapping 



1 Introduction 

On shared-memory multiprocessors, processes coordinate with each other via 
shared data structures. To ensure the consistency of these concurrent objects, 
processes need a mechanism for synchronizing their access. In such a system 
the programmer typically has to explicitly synchronize access to shared data 
by different processes to ensure correct behaviors of the overall system, using 
synchronization primitives such as semaphores, monitors, guarded statements, 
mutex locks, etc. Consequently the operations of different processes on a shared 
data structure should appear to be serialized: if two operations execute simul- 
taneously, the system guarantees the same result as if one of them is arbitrarily 
executed before the other. 
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Due to blocking, the classical synchronization paradigms using locks can incur 
many problems such as convoying, priority inversion and deadlock. A lock-free 
(also called non-blocking) implementation of a shared object guarantees that 
within a finite number of steps always some process trying to perform an op- 
eration on the object will complete its task, independently of the activity and 
speed of other processes [13]. As lock-free synchronizations are built without 
locks, they are immune from the aforementioned problems. In addition, lock- 
free synchronizations can offer progress guarantees. A number of researchers [1, 
4, 5, 13-15] have proposed techniques for designing lock-free implementations. 
The basis of these techniques is using some synchronization primitives such as 
compare- and- swap (CAS), or Load-linked {LL) / store- conditional (SC). 

Typically, the implementation of the synchronization operations is left to 
the designer, who has to decide how much of the functionality to implement in 
software using system libraries. The high-level specification gives lots of freedom 
about how a result is obtained. It is constructed in some mechanical way that 
guarantees its correctness and then the required conditions are automatically 
satisfied [3] . We reason about a high-level specification of a system, with a large 
grain of atomicity, and hope to deduce an implementation, a low-level speci- 
fication, which must be fine grained enough to be translated into a computer 
program that has all important properties of the high-level specification. 

However, the correctness properties of an implementation are seldom easy 
to verify. Our previous work [6] shows that a proof may require unreasonable 
amounts of effort, time, or skill. We therefore develop a reduction theorem that 
enables us to reason about a lock-free program to be designed on a higher level 
than the synchronization primitives. The reduction theorem is based on refine- 
ment mappings as described by Lamport [10], which are used to prove that a 
lower-level specification correctly implements a higher-level one. Using the re- 
duction theorem, fewer invariants are required and some invariants are easier to 
discover and easier to formulate, without considering the internal structure of 
the final implementation. In particular, nested loops in the algorithm may be 
treated as one loop at a time. 

2 Lock-Free Transformation 

The machine architecture that we have in mind is based on modern shared- 
memory multiprocessors that can access a common shared address space. There 
can be several processes running on a single processor. Let us assume there are 
1) concurrently executing sequential processes. 

Synchronization primitives LL and SC, proposed by Jensen et al. [2], have 
found widespread acceptance in modern processor architectures (e.g. MIPS II, 
PowerPC and Alpha architectures). They are a pair of instructions, closely re- 
lated to the CAS, and together implement an atomic Read/Write cycle. Instruc- 
tion LL first reads a memory location, say X, and marks it as “reserved” (not 
“locked”). If no other processor changes the contents of X in between, the subse- 
quent SC operation of the same processor succeeds and modifies the value stored; 
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otherwise it fails. There is also a validate instruction VL, used to check whether 
X was not modified since the corresponding LL instruction was executed. Imple- 
menting V L should be straightforward in an architecture that already supports 
SC. Note that the implementation does not access or manipulate X other than 
by means of LL/ SC /VL. Moir [12] showed that LL/ SC /VL can be constructed 
on any system that supports either LL/SC or CAS. A shared variable X only 
accessed by LL/ SC /VL operations can be regarded as a variable that has an 
associated shared set of process identifiers V.X, which is initially empty. The 
semantics of LL, VL and SC are given by equivalent atomic statements below. 

proc LL(ref X : val) : val = 

( V.X := V.X U {self}; return X; ) 

proc VL(ref X : val) : boolean = 

( return {self € V.X) ) 

proc SC {ref X : val; in Y : val) : boolean = 

( if self € V.X then V.X := 0; X :=Y; return true 
else return false; fi ) 

where self is the process identifier of the acting process. 

At the cost of copying an object’s data before an operation, Herlihy [13] 
introduced a general methodology to transfer a sequential implementation of 
any data structure into a lock-free synchronization by means of synchronization 
primitives LL and SC. A process that needs access to a shared object pointed 
by X performs a loop of the following steps: (1) read X using an LL operation 
to gain access to the object’s data area; (2) make a private copy of the indicated 
version of the object (this action need not be atomic); (3) perform the desired 
operation on the private copy to make a new version; (4) finally, call a SC 
operation on X to attempt to swing the pointer from the old version to the new 
version. The SC operation will fail when some other process has modified X 
since the LL operation, in which case the process has to repeat these steps until 
consistency is satisfied. The algorithm is non-blocking because at least one out 
of every P attempts must succeed within finite time. Of course, a process might 
always lose to some faster process, but this is often unlikely in practice. 

3 Reduction 

We assume a universal set V of typed variables, which is called the vocabulary. 
A state s is a type-consistent interpretation of V, mapping variables f G V to 
values s|v]. We denote by X the set of all states. If C is a command, we denote 
by Cp the transition C executed by process p, and s|Cp]t indicates that in state s 
process p can do a step C that establishes state t. When discussing the effect of a 
transition Cp from state s to state t on a variable v, we abbreviate s|f] to v and 
t|w] to v' . We use the abbreviation Pres{V) for /\.^,^y{v' = v) to denote that 
all variables in the set V are preserved by the transition. Every private variable 
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name can be extended with the suffix -|- “process identiS.er'\ We sometimes 
use indentation to eliminate parentheses. 

3.1 Observed Specification 

In practice, the specification of systems is concerned rather with externally visi- 
ble behavior than computational feasibility. We assume that all levels of specifi- 
cations under consideration have the same observable state space ifo, and are in- 
terpreted by their observation functions U : S ^ Sq. Every specification can be 
modeled as a five-tuple {S, U, 0,Af, C) where {S, 0,J\f) is the transition system 
[16] and C is the supplementary property of the system (i.e., a predicate on E“’). 

The supplementary constraint C is imposed since the transition system only 
specifies safety requirements and has no kind of fairness conditions or liveness 
assumptions built into it. Since, in reality, a stuttering step might actually per- 
form modifications to some internal variables in internal states, we do allow 
stuttering transitions (where the state does not change) and the next-state rela- 
tion is therefore reflexive. A finite or infinite sequence of states is defined to be 
an execution of system (E, 77, 0,Af,C) if it satisfies initial predicate 0 and the 
next-state relation Af but not necessarily the requirements of the supplementary 
property £. We define a behavior to be an infinite execution that satisfies the 
supplementary property C. A (concrete) specification Sc implements a (abstract) 
specification Sa iff every externally visible behavior allowed by Sc is also allowed 
by Sa. We write Beh{S) to denote the set of behaviors of system S. 

3.2 Refinement Mappings 

A refinement mapping from a lower-level specification Sc = (77c, 0ct^ci £c) 
to a higher-level specification Sa = (Ea, Ha, 0a,Afa,£a): written 4> : Sc Q Sa, is 
a mapping (j) : Ec ^ Ea that satisfies: 

1. (j) preserves the externally visible state component: Ua o (f> = lie. 

2. (j) is a simulation, denoted <p : Sc =4 Sa- 

il) 4> takes initial states into initial states: 0c ^ 0a ° 4>- 
® Afc is mapped by (f) into a transition (possibly stuttering) allowed by Na' 
Q A Nc =k Na o (j), where Q is an invariant of Sc- 

3. (j) maps behaviors allowed by Sc into behaviors that satisfy 5a ’s supplemen- 
tary property: V ct G Beh{Sc) '- £a(</>(c)). 

Below we need to exploit the fact that the simulation only quantifies over all 
reachable states of the lower-level system, not all states. We therefore explicitly 
allow an invariant Q in condition 2 d). The following theorem is stated in [11]. 

Theorem 1. If there exists a refinement mapping from Sc to Sa, then Sc im- 
plements Sa- 

Refinement mappings give us the ability to reduce an implementation by 
reducing its components in relative isolation, and then gluing the reductions 
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together with the same structure as the implementation. Atomicity guarantees 
that a parallel execution of a program gives the same results as a sequential and 
non-deterministic execution. This allows us to use the refinement calculus for 
stepwise refinement of transition systems [8]. Essentially, the reduction theorem 
allows us to design and verify the program on a higher level of abstraction. The 
big advantage is that substantial pieces of the concrete program can be dealt 
with as atomic statements on the higher level. 

The refinement relation is transitive, which means that we don’t have to 
reduce the implementation in one step, but can proceed from the implementation 
to the specification through a series of smaller steps. 



3.3 Correctness 

The safety properties satisfied by the program are completely determined by the 
initial predicate and the next-state relation. This is described by Theorem 2, 
which can be easily verified. 

Theorem 2. Let Vc and Va be safety properties for Sc and Sa respectively. 
The verification of a concrete judgment {Ec,Oc,J^c) |= Tc ean he reduced to 
the verification of an abstract judgment (Aa, 6>a, A/”a) H '^a, if we can exhibit a 
simulation (j mapping from Sc to Sa that satisfies Va ° 4> ^ Vc 

We make a distinction between safety and liveness properties (See [10] for the 
proof schemes). The proof of liveness relies on the fairness conditions associated 
with a specification. The purpose for fairness conditions is to rule out executions 
where the system idles indefinitely with control at some internal point of a pro- 
cedure and with some transition of that procedure enabled. Fairness arguments 
usually depend on safety properties of the system. 

4 A Lock-Free Pattern 

We propose a pattern that can be universally employed for a lock-free construc- 
tion in order to synchronize access to a shared node of nodeType. The interface 
Sa is shown in Fig. 1, where the following statements are taken as a schematic 
representation of segments of code: 

1. noncrit{ref pub : aType, priv : bType] in tm : cType; out x : 1..N) : 
representing an atomic non-critical activity on variables pub and priv ac- 
cording to the value of tm, and choosing an index a; of a shared node to be 
accessed. 

2. guard{±n X : nodeType, priv : bType) a non-atomic boolean test on the 
variable X of nodeType. It may depend on private variable priv. 

3. com(ref X : nodeType; in priv : bType; out tm : cType) : a non-atomic 
action on the variable X of nodeType and private variable tm. It is allowed 
to inspect private variable priv. 
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CONSTANT 

P = number of processes; N = number of nodes 
Shared Variables: 

pub: aType; Node: array [1..N] of nodeType; 

Private Variables : 

priv: bType; pc: {al, a2} ; x: 1..N; tm: cType; 

Program: 

loop 

al: noncritCpub, priv, tm, x) ; 

a2: (if guard (Node [x] , priv) then com(Node[x], priv, tm) ; fi ) 

end 

Initial conditions &a : V p:l..P: pc = al 
Liveness Ca ■ LI ( pc = a2 — > O pc = al ) 



Fig. 1. Interface Sa 



CONSTANT 

P = number of processes; N = number of nodes 
Shared Variables: 



pub: aType; node: array [1..N+P] 
indir: array [1..N] of 1..N+P; 


of nodeType 


Private Variables : 




priv: bType; pc: [cl.. c7] ; 




x: 


1..N; mp, m: 1..N+P; tm, tml: 


cType ; 


Program: 






loop 




cl 


noncritCpub, priv, tm, x) ; 
loop 




c2 


m := LL ( indir [x] ) ; 




c3 


read (node [mp] , node [m] ) ; 




c4 


if guard (node [mp] , priv) 


then 


c5 


comCnode [mp] , priv, tml); 


c6 


if SC ( indir [x], mp) then 



mp := m; tm := tml; break; 
fi 



c7 : else 

if VL(indir[x]) then break; fi 
fi 
end 
end 

Initial conditions ©c : 

(V p:l..P: pc = cl A mp=N+p) A (V i:l..N: indir[i]=i) 
Liveness Ca ■ □ { pc = c2 > <> pc = cl ) 



Fig. 2. Lock-free implementation Sc of <Sa 



The action enclosed by angular brackets (...) is defined as atomic. The private 
variable x is intended only to determine the node under consideration, the private 
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variable tm is intended to hold the result of the critical computation com, if 
executed. By means of Herlihy’s methodology, we give a lock- free implementation 
Sc of interface Sa in Fig. 2. In the implementation, we use some other schematic 
representations of segments of code, which are described as follows: 

4. read{ref X : nodeType, in Y : nodeType) : a non-atomic read operation 
that reads the value from the variable Y of nodeType to the variable X of 
nodeType, and does nothing else. If Y is modified during read, the resulting 
value of X is unspecified but type correct, and no error occurs. 

5. LL, SC and VL : atomic actions as we defined before. 

Typically, we are not interested in the internal details of these schematic com- 
mands but in their behavior with respect to lock- freedom. In Sc, we declare P 
extra shared nodes for private use (one for each process). Array indir acts as 
pointers to shared nodes. node[mp.p] can always be taken as a “private” node 
(other processes can read but not modify the content of the node) of process 
p though it is declared publicly. If some other process successfully updates a 
shared node while an active process p is copying the shared node to its “private” 
node, process p will restart the inner loop, since its private view of the node is 
not consistent anymore. After the assignment mp := m at line c6, the “private” 
node becomes shared and the node shared previously (which contains the old 
version) becomes “private”. 

Formally, we introduce Nc as the relation corresponding to command noncrit 
on {aTypexbTypexcType, aTypexbTypexl..N), Pg as the predicate computed 
by guard on nodeType x bType, Rc as the relation corresponding to com on 
{nodeType x bType, nodeType x cType), and define 

Sa — {Node[l..N], pub) x {pc, x, priv, tm)^ , 

Sc = {node[l..N+P\, indir[l..N\, pub) x {pc, x, mp, m, priv, tm, tml)^ , 
Ua{Sa) = {Node[l..N], pub), IIc{Sc) = {node[indir[\..N]], pub), 

A/"a = Vo<i<2-^a* J = \! \<i <7 ■^ci , 

The transitions of the abstract system can be described: Vs,t : Sa, p : 1..P: 
s[(Afao)plf — s = t (to allow stuttering) 

s\{-N'ai)p\t — pc.p = al Spc'.p = a2 S Presiy — {pub, priv.p, pc.p, x.p}) 

A {{pub, priv.p, tm.p), {pub, priv.p, x.p)') G Nc 
sI(Afa 2 )plf — pc.p = a2 A pc' .p = al A {Pg{Node[x\, priv.p) 

A {{Node[x], priv.p), {Node[x], tm.p)') G Rc 
A Pres{V — {pc.p, Node[x], tm.p}) 

\/ ^Pg{Node[x], priv.p) A Pres{V — {pc.p})). 

The transitions of the concrete system can be described in the same way. Here 
we only provide the description of the step that starts in c6: Vs, t : Ac, p : 1..P: 

s[(Afc6)plf — PC-P = c6 A {p G V.indir[x.p] 

A pc'.p = cl A {indir[x .p])' = mp.p A mp' .p = m.p 
A tm' .p = tml.p A {V.indir[x.p])' = 0 
A Pres{V — {pc.p, indir[x.p], mp.p, tm.p, V.indir[x .p]}) 

V p ^ V.indir[x.p] A pc' .p = c2 A Pres{V — {pc.p})) 
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4.1 Simulation 

According to Theorem 2, the verification of a safety property of concrete system 
Sc can be reduced to the verification of the corresponding safety property of 
abstract system Sa if we can exhibit the existence of a simulation between them. 

Theorem 3. The concrete system Sc defined in Fig. 2 is simulated by the ab- 
stract system Sa defined in Fig. 1, that is, 3<f> \ Sc ^ Sa. 

Proof: We prove Theorem 3 by providing a simulation. The simulation function 
(j) is defined by showing how each component of the abstract state (i.e. state 
of Sa) is generated from components in the concrete state (i.e. state of Sc). 
We define : the concrete location cl is mapped to the abstract location al, 
while all other concrete locations are mapped to a2; the concrete shared vari- 
able node[indir[x\] is mapped to the abstract shared variable Node[x], and the 
remaining variables are all mapped to the identity of the variables occurring in 
the abstract system. 

The assertion that the initial condition Oc of the concrete system implies the 
initial condition 0a of the abstract system follows easily from the definitions of 
Oc, Oa and fi. 

The central step in the proof of simulation is to prove that every atomic 
step of the concrete system simulates an atomic step of the abstract system. 
We therefore need to associate each transition in the concrete system with the 
transition in the abstract system. 

It is easy to see that the concrete transition Afd simulates Afai and that 
A/"c 2 , A/"c 3 , A/"c 4 , A/"c 5 , A/”c 6 with precondition “self ^ V.indir[x.self]” , and A/”c 7 
with precondition “self ^ V.indir[x.self]” simulate a stuttering step Afao in the 
abstract system. E.g., we prove that Afce executed by any process p with precon- 
dition “p ^ V.indir[x.pf^ simulates a stuttering step in the abstract system. By 
the mechanism of SC, an active process p will only modify its program counter 
pc.p from c6 to c2 when executing A/”c6 with precondition “p ^ V.indir[x.p\” . 
According to the mapping of (j), both concrete locations c6 and c2 are mapped 
to abstract location o2. Since the mappings of the pre-state and the post-state to 
the abstract system are identical, Afce executed by process p with precondition 
“p ^ V.indir[x.p\' simulates the stuttering step Nao in the abstract system. 

The proof for the simulations of the remaining concrete transitions is less 
obvious. Since simulation applies only to transitions taken from a reachable 
state, we postulate the following invariants in the concrete system Sc- 

Ql: (pyf q =k mp.p mp.q) A {indir[y] yf mp.p) 

A {y z ^ indir[y] yf indir[z\) 

Q2: pc.p = c6 A p € V.indir[x.p] 

=k {{node[m.p\, priv.p), {node[mp.p\, tml.p)) € Rc 
Q3: pc.p = cl A p &V.indir[x.p] =k ^ Pg{node[m.p], priv.p) 

Q4: pc.p € [c3..c7] A p € V.indir[x.p] =k m.p = indir[x.p] 

Q5: pc.p € {c4, c5} A p € V.indir[x.p] =k node[m.p] = node[mp.p] 

Q6: pc.p = {c5, c6} Pg{node[mp.p], priv.p) 
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In the invariants, the free variables p and q range over I..P, and the free variables 
y and z range over 1..7V. Invariant Q1 implies that, for any process q, node[mp.q] 
can be indeed treated as a “private” node of process q since only process q 
can modify that. Invariant Q4 reflect the mechanism of the synchronization 
primitives LL and SC. 

With the help of those invariants above, we have proved that Ncq and Afc 7 
executed by process p with precondition “p G V.indir[x.p]” simulate the abstract 
step J\fa 2 in the abstract system. For reasons of space we refer the interested 
reader to [7] for the complete mechanical proof. □ 



4.2 Refinement 



Recall that not all simulation relations are refinement mappings. According to 
the formalism of the reduction, it is easy to verify that (j) preserves the externally 
visible state component. For the refinement relation we also need to prove that 
the simulation (f> maps behaviors allowed by Sc into behaviors that satisfy Sa’s 
liveness property, that is, Vct G Beh{Sc) '■ La{4‘{o')). Since (j) is & simulation, we 
deduce 



a \= Cc 



a ^ □(pc = c2 — > <>pc = cl) 
a 1= □(pc G [c2..c7] — > <>pc = cl) 
(f>{a) \= □(pc = a2 — > Opc = al) 



Consequently, we have our main reduction theorem: 

Theorem 4. The abstract system Sa defined in Fig. 1 is refined by the concrete 
system Sc defined in Fig. 2, that is, 3(j) \ Sc Q Sa. 

The liveness property Lc of concrete system Sc can also be proved under the 
assumption of the strong fairness conditions and the following assumption: 

□ (□pc.p G [c2..c7] A □Op G V.indir[x.p] 

— > 0(pc.p = c6 V pc.p = cl) A p G V.indir[x.p\). 

The additional assumption indicates that for every process p, when process p re- 
mains in the loop from c2 to c7 and executes c2 infinitely often, it will eventually 
succeed in reaching c6 or c7 with precondition “p G V.indir[x.pY . 



5 Large Object 

To reduce the overhead of failing non-blocking operations, Herlihy [13] proposes 
an exponential back-off policy to reduce useless parallelism, which is caused 
by failing attempts. A fundamental problem with Herlihy’s methodology is the 
overhead that results from making complete copies of the entire object (c3 in 
Fig. 2) even if only a small part of an object has been changed. For a large object 
this may be excessive. 
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We therefore propose two alternatives given in Fig. 3. For both algorithms the 
fields of the object are divided into W disjoint logical groups such that if one field 
is modified then other fields in the same group may be modified simultaneously. 
We introduce an additional field ver in nodeType to attach version numbers 
to each group to avoid unnecessary copying. We assume all version numbers 
attached to groups are positive. As usual with version numbers, we assume that 
they can be sufficiently large. We increment the version number of a group each 
time we modify at least one member in the group. 

All schematic representations of segments of code that appear in Fig. 3 are 
the same as before, except 

3. com(ref X : nodeType] in g : 1..W, priv : bType; out tm : cType) : 
performs an action on group g of the variable X of nodeT ype instead of on 
the whole object X. 

4. read{ref X : nodeType; in Y : nodeType, g : 1..W) : only reads the value 
from group g of node Y to the same group of node X . 

The relations corresponding to these schematic commands are adapted accord- 
ingly. 

In the first implementation, mp becomes an array used to record pointers to 
private copies of shared nodes. In total we declare N * P extra shared nodes for 
private use (one for each process and each node). Note that node[mp[x].p] can 
be taken as a “private” node of process p though it is declared publicly. Array 
indir continues to act as pointers to shared nodes. 

At the moment that process p reads group i.p of node[m.p] (line ^5), process 
p may observe the object in an inconsistent state (i.e. the read value is not the 
current or historical view of the shared object) since pointer m.p may have been 
redirected to some private copy of the node by some faster process q, which 
has increased the modified group’s version number(in lines 19 and ?10). When 
process p restarts the loop, it will get higher version numbers at the array new, 
and only needs to reread the modified groups, whose new version numbers differ 
from their old version numbers. Excessive copying can be therefore prevented. 
Line 16 is used to check if the read value of a group is consistent with the version 
number. 

The first implementation is fast for an application that often changes only a 
small part of the object. However, the space complexity is substantial because 
P-|- 1 copies of each node are maintained and copied back and forth. Sometimes, 
a trade-off is chosen between space and time complexity. We therefore adapt 
it to our second lock-free algorithm for large objects (shown in Fig. 3 also) by 
substituting all statements enclosed by (*...*) for the corresponding statements 
in the first version. As we did for small objects, we use only one extra copy of a 
node for each process in the second implementation. 

In the second implementation, since the private copy of a node may belong 
to some other node, a process first initializes all elements of old to be zero (line 
H) before accessing an object, to force the process to make a complete copy of 
the entire object for the first attempt. The process then only needs to copy part 
of the object from the second attempt on. The space complexity for our second 
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CONSTANT 

P = number of processes; N = number of nodes; 

W = number of groups ; 

K = N + N*P; (*II:K = N + P;*) 

Type nodeType = record 

val: array [1..W] of valType; 
ver: array [1..W] of posnat; 
end 

Shared Variables: 

pub: aType; node: array [1..K] of nodeType; 
indir: array [1..N] of 1..K; 

Private Variables : 

priv: bType; pc: [iL.Zll]; 
x: 1. .N; m: 1. .K; 

mp: array [1 . .N] of 1 . .K; (* II: mp: 1..K; *) 

new: array [1..W] of posnat; old: array [1..W] of nat; 
g: 1..W; tm, tml : cType; i: nat; 

Program: 

loop 

ll\ noncritCpub, priv, tm, x) ; 

choose group g to be modified; 

old := node [mp [x] ]. ver ; (* II: old := A (i:l..W): 0; *) 

(* II: replace all ‘‘mp[x]’’ below by ‘‘mp’’ *) 

loop 

12 \ m := LL (indir [x] ) ; 

13: i := 1 

14: while i < W do 

new[i] := node [m] .ver [i] ; 
if new[i] ^ old[i] then 

15: read (node [mp [x] ] , node [m] , i) ; old[i] := 0; 

16: if not VL(indir[x]) then goto 12; fi; 

17: node [mp[x]] .ver [i] := new[i]; old[i] := new[i]; 

fi; 
i++; 
end; 

18: if guard (node [mp [x] ] , priv) then 

19: com (node [mp [x] ] , g, priv, tml); old[g] := 0; 

node [mp [x] ]. ver [g] := new[g]+l; 

ZIO: if SC ( indir [x], mp [x] ) then 

mp[x] := m; tm := tml; break; 
fi 

Zll: elseif VL(indir[x]) then break; 

fi 
end 
end 



Fig. 3. Lock-free implementation I (* implementation II *) for large objects 
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version saves {N — 1) x P times of size of a node, while the time complexity is 
more due to making one extra copy of the entire object for the first attempt. To 
see why these two algorithms are correct, we refer the interested reader to [7] 
for the complete mechanical proof. 

6 Conclusions 

This paper shows an approach to verification of simulation and refinement be- 
tween a lower-level specification and a higher-level specification. It is motivated 
by our present project on lock-free garbage collection. Using the reduction theo- 
rem, the verification effort for a lock-free algorithm becomes simpler since fewer 
invariants are required and some invariants are easier to discover and easier to 
formulate without considering the internal structure of the final implementation. 
Apart from safety properties, we have also considered the important problem of 
proving liveness properties using the strong fairness assumption. 

A more fundamental problem with Herlihy’s methodology is the overhead 
that results from having multiple processes that simultaneously attempt to up- 
date a shared object. Since copying the entire object can be time-consuming, 
we present two enhanced algorithms that avoid unnecessary copying for large 
objects in cases where only small part of the objects are modified. It is often 
better to distribute the contents of a large object over several small objects to 
allow parallel execution of operations on a large object. However, this requires 
that the contents of those small objects must be independent of each other. 

Formal verification is desirable because there could be subtle bugs as the 
complexity of algorithms increases. To ensure our hand-written proof presented 
in the paper is not flawed, we use the higher-order interactive theorem prover 
PVS for mechanical support. PVS has a convenient specification language and 
contains a proof checker which allows users to construct proofs interactively, to 
automatically execute trivial proofs, and to check these proofs mechanically. For 
the complete mechanical proof, we refer the reader to [7]. 
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Abstract. Model checking algorithms can report a property as being 
true for reasons that may be considered vacuous. Current algorithms for 
detecting vacuity require either checking a quadratic size witness formula, 
or multiple model checking runs; either alternative may be quite expen- 
sive in practice. Vacuity is, in its essence, a problem with the justification 
used by the model checker for deeming the property to be true. We argue 
that current definitions of vacuity are too broad from this perspective 
and give a new, narrower, formulation. The new formulation leads to a 
simple detection method that examines only the justification extracted 
from the model checker in the form of an automatically generated proof. 
This check requires a small amount of computation after a single verifi- 
cation run on the property, so it is significantly more efficient than the 
earlier methods. While the new formulation is stronger, and so reports 
vacuity less often, we show that it agrees with the current formulations 
for linear temporal properties expressed as automata. Differences arise 
with inherently branching properties but in instances where the vacuity 
reported with current formulations is debatable. 



1 Introduction 

The problem of detecting a vacuous model check of a property has received much 
attention in the literature [2,3,17,24,6,1]. A vacuous model check often indi- 
cates a problem, either with the precise formulation of an informal correctness 
property, or with the program itself. A classic example is that of the property: 
“Every request is eventually granted”, which is true vacuously of a program 
in which no request is ever made! Most model checkers produce independently 
checkable evidence for a negative answer in the form of a counterexample trace, 
but typically are not designed to produce any such evidence of their reasoning 
for success. Without further checking, therefore, one may end up trusting in 
vacuously justified properties. 

Several algorithms for vacuity detection have been proposed in the papers 
above. Essentially, these algorithms look for a sub-formula of the correctness 
property whose truth value does not matter for property verification (a pre- 
condition is that the property has been verified). In [3], vacuity is defined as 
follows: a program M satisfies a formula 4> vacuously iff M satisfies 4>, and there 
is some subformula ip of (j) such that ip does not affect the satisfaction of (p - this 
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last condition holds if the formula obtained by replacing '0 with any formula ^ 
(this is written as := ^]) is true of M. The detection method is to check 
an automatically generated witness formula for each subformula 0, one that 
demonstrates redundancy of tp if it is true - the witness formula for the property 
above is “No request is ever made”. In [17], this test is simplified and generalized 
to all of CTL* from the fragment of ACTL considered in [3]. It is shown that 
to determine whether ip affects (p, it suffices to check whether cp[ip := false] 
{(p[ip := true]) holds of M if 0 occurs with positive (negative) polarity in 0 
(i.e., under an even (odd) number of negations). These methods treat multiple 
occurrences of the same subformula independently. A recent paper [1] extends 
the method above to take into account such dependencies. We adopt their term, 
formula vacuity, to refer to the definition that treats subformulas independently. 

A major drawback of these algorithms is that they either require multiple 
model checking runs to test each subformula, or a single run checking a quadrat- 
ically long witness formula. (The witnesses are of linear length for the w-ACTL 
fragment of [3].) Although the cost of model checking increases only linearly 
in the length of the formula, even a linear blowup can be quite significant in 
terms of the resources used (time, space, processors), since it is common for 
large verification runs to take hours to run on a single formula. Remedies have 
been proposed: in [1], formula structure is used to reduce the number of witness 
checks, and [24, 6] show how to share intermediate results while checking a single 
witness formula. 

A vacuous verification is, in its essence, a problem with the justification used 
by the model checking algorithm for deeming the property to be true. Recent 
work has shown how to extract and present such this justification in the form of 
a deductive proof (see [21,22] and the references therein). The central question 
this paper discusses is whether one can analyze this justification alone in order 
to detect a vacuous verification? The premise seems reasonable, but there are 
subtleties involved in making it work, and in determining the precise relationship 
to formula vacuity. 

To make this relationship clearer, we examine formula vacuity from the view- 
point of justifications. We show that a mu-calculus formula 0 is formula- vacuous 
if, and only if, there exists a valid correctness proof showing that 0 is true of M 
in which the invariant for some subformula contains only unreachable states or 
is empty. Call such a proof vacuous. Formula vacuity, then, is equivalent to ask- 
ing the question: “Does there exist a vacuous correctness proof?” . Viewed from 
this angle, the criterion appears rather odd: why should one discard a valid, 
non- vacuous correctness proof, just because there is an alternative proof that is 
vacuous? Examples given later in the paper support this conclusion, by showing 
that formula vacuity can sometimes produce debatable reports of vacuity. 

We propose a new, stronger, formulation of vacuity, which we call proof vacu- 
ity. This consists of checking whether the correctness proof produced by the 
model checker as justification is vacuous. Since our vacuity criterion is stronger 
than formula vacuity, one may expect the new check to report vacuity less often. 
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However, we show that this may happen only for properties that are inherently 
branching in nature. 

Besides correcting some anomalies in the formula vacuity criterion, the new 
formulation is, moreover, significantly more efficient to check in practice. All that 
is required is to examine for emptiness the invariants produced during a model 
checking run. As these invariants are readily available, this process is even simpler 
and requires less resources than the generation of a full proof of correctness. As 
the two formulations coincide for linear time automaton properties, this also gives 
a significantly more efficient way of checking formula vacuity for such properties. 

The next section introduces the concept of proof used in new formulation. 
In Section 3, we characterize formula vacuity in terms of proofs, motivate and 
present proof vacuity, and show their equivalence for automaton representations 
of linear time properties. Section 4 discusses related work, and some future re- 
search directions, including “partial vacuity” detection. 

2 Preliminaries 

We assume that properties are specified as alternating tree automata. In this 
section, we define the automaton syntax and semantics, and also give the proof 
system used to represent the justification of model checking results. 

Transition Systems. We represent a program by the transition system (TS, for 
short) that it generates - this is also called a Kripke Strueture. A TS is a tuple 
(5, 5, R, L), where S' is a non-empty set of states, S C S is the set of initial states, 
i? C S X S is a transition relation, and L : S — > 2^^ (AP is a set of atomic 
propositions) is a labeling function that maps each state to the the propositions 
true at that state. We assume that R is total: i.e., for every s, there exists t such 
that (s,t) G R. 

Temporal Logics. While our results are based on alternating tree automata, we 
also consider properties defined in one of several well-known temporal logics. The 
syntax of these logics is defined below; please see [8] for the precise semantics. 

LTL [23] , linear temporal logic, is a logic that defines infinite sequences over 
subsets of AP. In positive normal form, formulas of LTL are given by the gram- 
mar: <P ::= P{P G AP) I ^P{P G AP) \ L \ L> \J L> \ X{<P) \ (^U^) | {L>\N<P). 
The temporal operators are X (next-time), U (until), and W (weak-until or un- 
less). Other operators are defined through abbreviation: F(^) (eventually <P) is 
{true[J<I>), and G(^) (always <P) is [<IANfalse). 

CTL* [7], a branching time logic, is given by adding path quantifiers A (over 
all paths) and E (over some path) to LTL; the set of state formulas are defined 
inductively as: <P ::= P(P G AP) | ~^P{P G AP) | ^ A | ^ V | A(i/;) | E(i/;). 
Here, ip is the set of path formulas, which are LTL formulas where atomic propo- 
sitions come from (p. CTL [5] is obtained from CTL* by restricting the set of 
path formulas so that they contain a single, top-level, temporal operator. Its 
sub- logic ACTL (ECTL) is obtained by allowing only the A (E) path quantifier; 
ACTL* (ECTL*) is the analogous restriction of CTL*. 
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The mu-calculus [16] is a branching time temporal logic which subsumes 
the logics defined above [10]. Formulas in positive form are defined using the 
following grammar, where y is a set of symbols denoting fixpoint variables, 
and fi (z/) is the least (greatest) fixpoint operator: ^ ::= P{P € AP) \ Z{Z € 
y) I ^ A ^ I ^ V ^ I EX(<?) I AX(^) I {^Z : <P) \ {vZ : <P). A formula must have 
each variable under the scope of an even number of negation symbols; it is closed 
iff every variable in it is under the scope of a fixpoint operator. The evaluation 
of a formula / with free variables is a subset of S, defined inductively relative 
to a context that maps free variables to subsets of S. A state s in M satisfies a 
closed mu-calculus formula / iff s is in the evaluation of / relative to the context 
that maps every variable to the empty set; M satisfies / (written as M \= f) iff 
all of its initial states satisfy /. 

Alternating Tree Automata. An alternating automaton over AP is specified by a 
tuple (Q, q, 6, F), where Q is a non-empty set of states, q G Q is the initial state, 
i5 is a transition function, and A is a parity acceptance condition. F is given by 
a partition {Fq, Fi, . . . , F 2 n) of Q. An infinite sequence over Q satisfies F iff the 
smallest index i for which a state in Fi occurs infinitely often on the sequence is 
even. We use a simple normal form for the transition relation: it maps an automa- 
ton state and an input from 2^^ to one of { /\ R), {V R),EXr, AXr, true, false, 
where r is a state, and R a non-empty subset of states. 

Given a mu-calculus formula, one can obtain an equivalent automaton whose 
transition relation is just the parse graph of the formula (so that each subformula 
is represented by an automaton state), and the parity condition is determined 
by an analysis of the alternation between least and greatest fixpoints in the 
formula [9]. 

These automata accept computation trees of programs, where tree nodes 
are labeled with subsets of atomic propositions. For a labeled tree t, let At be 
the labeling function that maps tree nodes to their labels. The acceptance of a 
labeled tree t by the automaton is defined in terms of a two-player infinite game. 
A configuration of the game is a pair (x, q), where a; is a tree node and q is an 
automaton state. The moves of the game are as shown in Figure 1. 

A play of the game is a maximal sequence of configurations generated in 
this manner. A play is a win for player I iff either it is finite and ends in a 
configuration that is a win for I, or it is infinite and satisfies the automaton 



At configuration (x, q), based on the form of S{q, Xt{x)), 

— true'. Player I wins, and the play is halted 

— false: Player II wins, and the play is halted 

— (Vi?): Player I picks r G R-, the next configuration is {x, r) 

— ( A i?): Player II picks r G R', the next configuration is {x, r) 

— EXr: Player I picks a child y of a:; the next configuration is {y, r) 

— AXr: Player II picks a child y of x', the next configuration is {y,r) 



Fig. 1. Model Checking Game Moves 



An Efficiently Checkable, Proof-Based Formulation 



61 



acceptance condition. The play is winning for player II otherwise. A strategy for 
player I (II) is a partial function that maps every finite sequence of configurations 
to a choice at each player I (II) position. A strategy is a win for player I if every 
play following that strategy is a win for I, regardless of the strategy for II. The 
automaton accepts the tree t iff player I has a winning strategy for the game 
starting at (e,g), where e is the root node of the tree t. M satisfies A (written 
as M \= A) iff the automaton accepts the computation trees obtained from each 
initial state by “unwinding” the transition relation of M. 

The Proof System. The proof system presented in [21] is as follows. For a TS 
M = (S', S, R, L) and automaton A = (Q, q, S, F), a proof (0, p, W) of A over M 
is given by specifying (i) [Invariants] for each automaton state q, an invariant 
predicate fig (i.e., a subset of S), and (ii) [Rank Functions] for each automaton 
state q, a partial rank function, pg, from S to a well-founded set (VF,^), with 
induced rank relation <1^ over W x W. For vacuity checking, the precise nature 
of the rank functions and relations is not important^. The invariants and rank 
function must satisfy the three local conditions given in Figure 2 for the proof 
to be valid. In these conditions, the variable w ranges over elements of W, the 
notation [/] means that / is valid, and a term of the form pr <lq w {pr = w) 
represents the predicate (As : Pr(s) <ig w) ((As : Pr{s) = w)). 



— Consistency: {pq is defined for every state in fq) For each q G Q, [(j)q ^ 
{3w : {pq = w))] 

— Initiality: (the initial states satisfy the initial invariant) [S <j>q] 

— Invariance and Progress: For each q G Q, depending on the form of S{q,l), 
where I is a propositional formula over AP, check the following. 

• true: nothing to check. 

• false- [<f)q -iZ] 

• {y R)-. [4>q A I A {pq = w) => {y r : r G R ■■ fr A {pr <\q ui))] 

• {AR)-. [fqAlA{pq=w) =^> (Ar : r G R ■■ 4>r A (pr <qW))] 

• EXr: [fq A I A {pq = lu) => EX(</>i. A {pr <\q rc))] 

• AXr: [4>q A I A {pq = w) ^ AX{<j>r A {pr w))] 



Fig. 2. The Proof System Rules 



Theorem 0 ([21]) (Soundness) If there is a valid proof of A over M, then 
M \= A. (Completeness) If M \= A, there is a valid proof for A over M. 

The proof of the completeness theorem is important to vacuity detection, 
so we sketch it here. This proof considers the “run tree” formed by a winning 
strategy in the game, where the tree contains a single player I move at a node 

^ For F of size 2N -|- 1, IT is the product of N well-founded sets and ^ 

is the lexicographic order induced by -<n- For a,b G W, a <lq b holds if, 

and only if, for the unique k such that q is in Fk, either k = 2i, for some i, and 
(ai, ■ . ■ ,ai) -A (&!,..., bi), or fc = 2i — 1, for some i, and (ai , . . . ,ai) -< (&i, . . . , bi). 
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labeled by a configuration for player I, and all player II moves at a node labeled 
for player II. The choices of a move for player I are resolved uniquely by the 
winning strategy. One can extract a proof from this run tree. In particular, the 
invariant <j)q for q is the set of program states s for which there is a configuration 
of the form (a;, q) in the run tree where x contains s. 

3 Defining and Detecting Vacuity 

We assume that all formulas are in positive normal form. Let be a closed 
mu-calculus formula. For a subformula '0 of we define the strict positive 
subformulas of 0 to be those subformulas of ip that are not fixpoint variables 
from a scope outside that of ip. For instance, in the fairness property {vZ : {fjY : 
EX(P A Z) V EX(F))) (“There exists a path where P holds infinitely often”), 
Z is not a strict subformula of EX(P A Z), although (P A Z) is one. 

Theorem 1 Let <L> he a elosed mu-ealeulus formula, and ip he a strict positive 
suhformula. Let A<p he the automaton formed from the parse graph of <P, with 
state q corresponding to ip. Then, <L[ip := false] holds of M iff there is a proof 
that M satisfies A.p in which the invariant for q is the empty set. 

Proof, (left-to-right) Let <P' = d>[ip := false]. The parse graph for can be 
obtained from that of (P by deleting nodes that correspond to strict subformulas 
of Ip, and replacing ip with false. Thus, an automaton for P, A^' , can be obtained 
from A^, by deleting states that correspond to strict sub-formulas of ip, and by 
setting S'{q, 1) = false for all 1. As <P' holds for M, by the completeness theorem, 
there is a valid deductive proof {<p' , p' , W] which, of course, is based only on the 
states in A<p'. As this proof is valid, the assertion [0^ ~^l] must be valid for 

all 1; thus, (p'g = 0 . Now we form a proof {(p,p,W) as follows: let W = W, and 
for r in A<p' , let 0^ = (p'r Pr = p'^. For any other state p, let 0p = 0 , and let 
Pp be some fixed element of W (the choice is arbitrary) . 

We claim that this is a valid proof for A$ over M. Since each state p missing 
from A^' represents a strict subformula of 0, it is reachable only from q] hence, 
setting its invariant to the empty set does not affect the validity of the checks 
for states of A^' . Furthermore, since the invariants are set to empty, the own 
checks for these states, which have the form [(0p A a) => /?], are trivially valid. 
Hence, the new proof is valid, and it has an empty invariance set for q. 

(right-to-left) Suppose that there is a proof {(p,p,W), where (pq = 0- Then, 
the rule for q holds trivially. Consider a state p that corresponds to a strict 
subformula of ip. As p is reachable only from q, its invariant does not affect 
the validity of the check for any state r that does not correspond to a strict 
subformula of ip. Hence, dropping states that correspond to strict subformulas 
of Ip, and replacing S{q, 1) with false for all I, yields a valid proof of A^>' over M. 
By the soundness theorem, this implies that M satisfies <?'. □ 

Given this theorem, the existing methods for vacuity detection can be viewed 
as searching for a vacuous proof. This seems to be too broad a formulation: it 
disregards a non-vacuous proof produced by a model checker simply because 
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Fig. 3. Transition Systems 



there exists an alternate, vacuous justification for the property. To illustrate this 
point further, we consider some examples. 

Existential Properties. The existential analogue of the example defined in the 
introduction is “There exists a computation where every request is eventually 
granted”. This may be formulated in ECTL as EG(reg ^ Ef {grant)). For 
the TS shown in Figure 3(a), the property is true, as on the left-hand path, a 
request is sent and subsequently granted. This witness does make use of every 
subformula, so is non- vacuous. However, replacing the subformula E.F {grant) with 
false, we obtain the property EG(^reg), which is true of the right-hand branch; 
thus, the verification is considered vacuous by the formula-vacuity criterion. This 
is clearly odd: a perfectly reasonable, non- vacuous proof is disregarded because 
of the existence of an alternative, vacuous proof. 

Universal, Non-linear Properties. Consider another property, in this case a 
universal one, but one that is not expressible in linear-time logics: “At ev- 
ery successor state, either inevitably P holds, or inevitably Q holds”, where 
P and Q are atomic propositions. The property can be written in ACTL as 
AX(AF(P) V AF(Q)). Consider the TS shown in Figure 3(b), with the dashed 
arrow representing a long path to a state satisfying the target proposition. The 
property is clearly true of the TS. The shortest proof will favor satisfying AF(P) 
for the left successor of the initial state, since it can be done in fewer steps 
than satisfying AF(Q), and it will favor satisfying AF(Q) for the other succes- 
sor, for the same reason. This is a non-vacuous proof, since all subformulas are 
used. However, replacing the subformula AF(Q) by false results in the prop- 
erty AX(AF(P)), which is also true of the TS; hence, the verification is vacuous 
by formula vacuity. This conclusion is debatable: again, a perfectly reasonable 
proof (in fact, the shortest one) is disregarded because of a vacuous proof that 
is non-optimal. 

3.1 Proof Vacuity 

Given these anomalies resulting from the formula- vacuity criterion, we seek to 
define a stronger criterion based on the justification actually given by the model 
checker. One possible approach is to examine only the justification provided 
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by a given model checker for vacuity. This is simple, but has the disadvantage 
of being sensitive to the choice of the model checker. This sensitivity may not 
be a problem if a particular model checker is standardized for use within an 
organization. 

We explore, however, the consequences of a definition that is insensitive to the 
model checking strategy. As a preliminary, given M and Axp, for a mu-calculus 
formula call a state of M game-reachable iff it occurs in some configuration 
of the game for M |= Note that all unreachable states of M are also game- 
unreachable, but that some reachable states of M may also be game-unreachable. 
For instance, the game for the formula ^ = P examines only the initial states 
of M\ thus, all other states are game-unreachable. The proof of the soundness 
theorem in [21] shows that each valid proof defines a history- free winning strategy 
for the model checking game. Thus, removing game-unreachable states from each 
invariant set does not change the validity of a proof. We can now define a proof 
to be vacuous for some subformula as follows. 

Definition 0 (Vacuous Proof) A proof for an automaton property is vacuous 
for automaton state q iff the invariant for q is empty or contains only game- 
unreachable states. A proof is vacuous iff it is vacuous for some automaton 
state. 

Theorem 1 can then be amended so that it is also possible for the invariant 
for q to contain game-unreachable states. We rely on this amended form in the 
subsequent discussion. We can now define proof vacuity as follows. Notice that 
the crucial difference with formula vacuity is that a verification is considered to 
be vacuous only if all all model checking strategies give rise to vacuous proofs. 

Definition 1 (Proof Vacuity) For an automaton property A and a TS M, the 
verification of A on M is proof-vacuous iff for some automaton state q, every 
proof that M satisfies A is vacuous for q. 

The rationale for excluding game-unreachable, rather than just unreachable 
states is as follows. Consider the property AX(P ^ AX(Q)). This is true 

vacuously of a TS M for which all successors of the initial state satisfy ~^P. 
However, there may be a proof where the invariant for the state corresponding 
to AX(Q) includes reachable states of M that satisfy AX(Q); thus, the weaker 
criterion would not properly detect the vacuous verification. Game-reachability 
is not that stringent a criterion; for instance, COSPAN [12], a symbolic model 
checker for linear-time automaton properties, restricts the model checking to the 
reachable states of the product of the TS and a negated property automaton. 
This product defines the game configurations: thus, all invariants include only 
game-reachable states. 

Although proof- vacuity has a universal quantification over all proofs, we can 
show that it suffices to consider a maximally inclusive proof, produced by in- 
cluding every game reachable state that satisfies an automaton state q m (fq - 
this is just the proof provided by the completeness theorem. 
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Theorem 2 For an automaton property A and TS M , the property is true proof- 
vacuously for M iff the maximally inclusive proof is vacuous. 

Proof. It is slightly easier to show the contrapositive. Suppose that the property 
is true non-proof-vacuously. Then there is a non- vacuous proof, one in which the 
invariant for every automaton state contains some game-reachable state. This 
state must be present, by inclusiveness, in the invariant for that automaton state 
in the maximally inclusive proof. Hence, the maximally inclusive proof is non- 
vacuous. In the other direction, if the maximally inclusive proof is non- vacuous, 
the verification is non-proof- vacuous by definition. □ 

Vacuity Checking. Suppose that the states of M are represented symbolically by 
a vector of Boolean variables b, and that the states of A<p are represented by a 
vector of Boolean variables c. Let Win{b^ c) be the set of reachable, winning game 
configurations that is generated by the model checker, as indicated above for 
COSPAN. The vacuity check then amounts to detecting whether the encoding 
of some valid automaton state q is not associated with any state in Win - 
this means that the invariant for q is empty. Such a check can be performed 
through symbolic (BDD) calculations. One such method is to calculate the set 
of automaton states with non-empty invariants, nonempty{c), as (3b : Win(b., c)), 
and check whether valid nonempty, where valid(c) defines valid automaton 
states. 

An interesting point is that proof-vacuity, as defined, does not take into 
account the distinction between optimal and non-optimal proofs. For instance, 
consider the ACTL formula AX(AF(P) V AF(Q)) and the TS in Figure 3(c). 
An optimal proof, in which player I makes choices that satisfy eventualities as 
early as possible, has an empty invariant for AF(Q); a less-than-optimal proof 
would include the third-level states (which are game-reachable) in the invariant 
for AF(Q). Thus, the verification is non-proof- vacuous due to the existence of the 
less-than-optimal, non-vacuous proof. However, it might be of interest to know 
that the shortest proof of correctness does not use a certain subformula. The 
construction for the completeness theorem can be adjusted to create a proof 
that is optimal (vs. maximally inclusive) as follows. The construction already 
results in ranks that define the shortest path to satisfying eventualites. All that 
remains is to prune the maximally inclusive invariants so that states that do not 
contribute to the shortest path to fulfilling any eventuality (e.g., the third- level 
states in this example) are eliminated from the invariants. It is not clear, though, 
whether an analogue of Theorem 2 can be obtained. Note that a discussion of 
optimality differences is based on quantifying progress through rank functions; 
progress concerns have so far not played a role in the identification of vacuous 
justifications. As shown below, differences in optimality are of concern only for 
certain branching-time properties. 

Theorem 3 For a mu-calculus property (f> and TS M , if there is a unique run 
tree for the model checking game on M and A^j,, then formula-vacuity for cf 
coincides with proof-vacuity (optimal-proof-vacuity) for A(f,. 
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Proof. Each run tree corresponds to a winning strategy for player I, which is 
history-free (due to the parity winning condition) and so corresponds to a proof 
(cf. [21]). As there is a unique run tree, there is only a single valid proof. If 
(j) is true formula- vacuously, this proof is vacuous, and hence the verification is 
also proof- vacuous, since there are no other valid proofs to consider. The other 
direction follows as proof-vacuity is stronger than formula-vacuity. □ 

This theorem has some interesting consequences for linear and branching 
time properties. Consider the case where a linear-time property is specified by 
expressing its negation as a a Biichi word automaton ~ this is the usual approach 
to model checking a linear-time property. From such an automaton i3, one may 
construct a universal dual automaton B with the same transition relation and 
a complemented Biichi acceptance condition [20]. Universality means that B 
accepts a computation if all runs on it are accepting. This dual automaton can 
also be viewed as a universal tree automaton for the branching time property 
“All computations of M are accepted by i?” . As B is universal, there is a single 
run tree of B on M. Applying Theorem 3, one obtains the consequence given 
below. For pure branching time properties, proof-vacuity is strictly stronger, as 
shown by the examples discussed previously. 

Theorem 4 Let B he a universal co-Buchi tree automaton that expresses that 
a linear-time correctness property holds for all computations of M . If M \= B, 
the verification is formula vacuous ijf it is proof (optimal-proof) vacuous. 

Linear Properties as LTL Formulas. Theorem 4 showed that, for linear-time 
properties expressed as (negated) automata, proof vacuity coincides with formula 
vacuity. The situation is different if formula vacuity is applied to LTL formulas. 
Intuitively, this is because model checking is carried out with an automaton 
derived from an LTL formula (cf. [II]), where each automaton state represents 
a set of subformulas. Thus, vacuity at the level of the automaton states makes 
possible finer distinctions than vacuity at the level of formulas. 

This difference can be seen with an LTL formula that is similar - though 
not equivalent - to the ACTL formula defined previously: X(F(P) V F(Q)). This 
holds for the TS in Figure 3(b); however, so does X(F(P)), which is obtained by 
setting the F(Q) subformula to false. A universal co-Biichi automaton for this 
formula is defined by the state set {0, 1,2} with initial state 0, and transitions 

^ ( P^Q) ^ 2 2, and the co-Biichi acceptance con- 

dition FG(2). This automaton is verified without vacuity (either by the old or, 
from Theorem 4, by the new formulation). 

We can show equivalence of the formula and automaton vacuity definitions, 
however, for a subclass of LTL formulas. This class, LTL'^®*, is defined in [19], 
where it is shown that it represents exactly those LTL properties that are defin- 
able in ACTL. In fact, the ACTL property equivalent to checking that 4> holds 
on all computations can be obtained simply by attaching the A path operator to 
each temporal operator in (f>-, we refer to this ACTL formula as . Any ACTL 
property has a direct translation to an equivalent mu-calculus property, so we 
may consider also as a mu-calculus property. The precise syntax is given be- 
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low. In this syntax, p is a propositional state predicate, so that every V -choice 
can be resolved deterministically at a state. 

<? ::= p I ^ A I (p A V (^p A <?) | X(^) | ((p A ^)U(^p A ^)) | ((p A 
^)W(^p A <?)). 

Theorem 5 for an LTL'^^* formula (f> and TS M, the verification of (f> on M 
is formula-vacuous iff the verification of 4>-\- on M is proof -vacuous. 

Proof. Consider any positive sub-formula f) of fi. Then, A{(p[tjj := false]) is 
equivalent to '■= false])~^, which is identical to := false] by the 

nature of the transformation. Thus, (j) is true formula-vacuously for M with if 
as the witness subformula iff (/>“*" is true formula-vacuously for M , with as 
the witness subformula. From the determinism inherent in the syntax, there is 
a unique run tree for the verification of (ffi~ . By Theorem 3, formula- vacuity for 
(jfi~ coincides with proof-vacuity. □ 

4 Conclusions and Related Work 

This paper argues that the current formulation of vacuity is too broad. This con- 
clusion is supported both by examples, and by a theorem showing that formula- 
vacuity is equivalent to the existence of a vacuous proof. We propose a new, 
narrower formulation, which needs to inspect only the invariants of a single, 
maximally inclusive proof. Besides resolving anomalies with the earlier formu- 
lation, the new one can be checked more efficiently, by using information that 
is easily gathered during model checking. This should translate to a significant 
advantage in practice. In fact, checking proof vacuity with COSPAN is trivial, 
since the model checker produces a list of game-unreachable automaton states; 
the verification is vacuous iff this list contains a valid automaton state. 

The most closely related work in vacuity detection has been discussed 
throughout the paper. As pointed out in [18], vacuity is but one of several ap- 
proaches to further inspecting a valid answer from a model checker. One may 
also examine the state “coverage” (cf. [14, 13,4]). Vacuity may be viewed, dually, 
as a form of specification coverage, since it checks whether certain sub-formulas 
are redundant for the verification. More generally, producing justifications in 
the form of deductive proofs (cf. [21,22]) or as interactive games or tableaux 
[15,26,25] can offer deeper insight into why a property holds (or fails) of the 
program, since these take ranking functions also into account (cf. the discussion 
on optimality). 

An interesting research topic is to recognize “partial” vacuity in a way that 
produces useful results. For instance, consider a tree-structured transition system 
where the left subtree of the root satisfies the request-grant property vacuously 
(by never sending requests), while the right subtree satisfies it non- vacuously 
(there is at least one request). Neither the earlier formulation nor the one pro- 
posed here would consider this to be a vacuous verification, yet there is clearly 
something odd in this proof. It seems likely that a more detailed analysis of 
correctness proofs will enable the detection of such instances. 



68 



Kedar S. Namjoshi 



Acknowledgements 

This work was supported in part by grant CCR-0341658 from the National 
Science Foundation. 



References 

1. R. Armoni, L. Fix, A. Flaisher, O. Grumberg, N. Piterman, A. Tiemeyer, and M.Y. 
Vardi. Efficient vacuity detection in linear temporal logic. In CAV, volume 2725 
of LNCS. Springer- Verlag, 2003. 

2. D. Beatty and R. Bryant. Formally verifying a microprocessor using a simulation 
methodology. In 31st DAC. IEEE Computer Society, 1994. 

3. I. Beer, S. Ben-David, C. Eisner, and Y. Rodeh. Efficient detection of vacuity in 
ACTL formulas. In CAP, volume 1254 of LNCS, 1997. (full version in FMSD, 
18(2), 2001). 

4. H. Chockler, O. Kupferman, R.P. Kurshan, and M.Y. Vardi. A practical approach 
to coverage in model checking. In CAV, volume 2102 of LNCS. Springer- Verlag, 
2001 . 

5. E.M. Clarke and E. A. Emerson. Design and synthesis of synchronization skeletons 
using branching time temporal logic. In Workshop on Logics of Programs, volume 
131 of LNCS. Springer- Verlag, 1981. 

6. Y. Dong, B. Sarna-Starosta, C.R. Ramakrishnan, and S.A. Smolka. Vacuity check- 
ing in the modal mu-calculus. In AMAST, volume 2422 of LNCS. Springer- Verlag, 
2002 . 

7. E. A. Emerson and J. Y. Halpern. “Sometimes” and “Not Never” revisited: on 
Branching versus Linear Time Temporal Logic. J.ACM, 33(1):151-178, January 
1986. 

8. E.A. Emerson. Temporal and modal logic. In Jan van Leeuwen, editor. Handbook 
of Theoretical Computer Science, Vol. B: Formal Methods and Semantics. Elsevier 
and MIT Press, 1990. 

9. E.A. Emerson and C.S. Jutla. Tree automata, mu-calculus and determinacy (ex- 
tended abstract). In FOCS, 1991. 

10. E.A. Emerson and C-L. Lei. Efficient model checking in fragments of the proposi- 
tional mu-calculus (extended abstract). In LICS, 1986. 

11. R. Gerth, D. Peled, M.Y. Vardi, and P. Wolper. Simple on-the-fly automatic 
verification of linear temporal logic. In PSTV. Ghapman & Hall, 1995. 

12. R.H. Hardin, Z. Har’el, and R.P. Kurshan. GOSPAN. In CAV, volume 1102 of 
LNCS, 1996. 

13. Y. Hoskote, T. Kam, P-H. Ho, and X. Zhao. Coverage estimation for symbolic 
model checking. In 37th DAC. ACM Press, 1999. 

14. S. Katz, D. Geist, and O. Grumberg. “Have I written enough properties?” A 
method for comparison between specification and implementation. In CHARME, 
volume 1703 of LNCS. Springer- Verlag, 1999. 

15. A. Kick. Generation of witnesses for global mu-calculus model checking, available 
at http://citeseer.ist.psu.edu/kick95generation.html, 1995. 

16. D. Kozen. Results on the propositional mu-calculus. In ICALP, volume 140 of 
LNCS. Springer- Verlag, 1982. 

17. O. Kupferman and M.Y. Vardi. Vacuity detection in temporal model checking. In 
CHARME, number 1703 in LNGS. Springer- Verlag, 1999. (full version in STTT 
4(2), 2003). 



An Efficiently Checkable, Proof-Based Formulation 



69 



18. O. Kupferman and M.Y. Vardi. Vacuity detection in temporal model checking. 
STTT, 4(2), 2003. 

19. M. Maidl. The common fragment of CTL and LTL. In FOCS, 2000. 

20. Z. Manna and A. Pnueli. Specification and verification of concurrent programs by 
V-automata. In POPL, 1987. 

21. K. S. Namjoshi. Certifying model checkers. In CAV, volume 2102 of LNCS, 2001. 

22. D. Peled, A. Pnueli, and L. D. Zuck. From falsification to verification. In FSTTCS, 
volume 2245 of LNCS, 2001. 

23. A. Pnueli. The temporal logic of programs. In FOCS, 1977. 

24. M. Purandare and F. Somenzi. Vacuum cleaning CTL formulae. In CAV, volume 
2404 of LNCS. Springer- Verlag, 2002. 

25. P. Stevens and C. Stirling. Practical model-checking using games. In TACAS, 
volume 1384 of LNCS. Springer- Verlag, 1998. 

26. S. Yu and Z. Luo. Implementing a model checker for LEGO. In FME, volume 1313 
of LNCS, 1997. 



Termination of Linear Programs 



Ashish Tiwari* 

SRI International 

333 Ravenswood Ave, Menlo Park, CA, USA 
tiwari@csl . sri . com 



Abstract. We show that termination of a class of linear loop programs 
is decidable. Linear loop programs are discrete-time linear systems with 
a loop condition governing termination, that is, a while loop with lin- 
ear assignments. We relate the termination of such a simple loop, on all 
initial valnes, to the eigenvectors corresponding to only the positive real 
eigenvalues of the matrix defining the loop assignments. This character- 
ization of termination is reminiscent of the famous stability theorems in 
control theory that characterize stability in terms of eigenvalues. 



1 Introduction 

Dynamical systems have been studied by both computer scientists and control 
theorists, but both the models and the properties studied have been different. 
However there is one class of models, called “discrete-time linear systems” in 
the control world, where there is a considerable overlap. In computer science, 
these are unconditional while loops with linear assignments to a set of integer 
or rational variables; for example, 

while {true) { x:=x — y;y:=y }. 

The two communities are interested in different questions: stability and control- 
lability issues in control theory against reachability, invariants, and termination 
issues in computer science. In recent years, computer scientists have begun to ap- 
ply the rich mathematical knowledge that has been developed in systems theory 
for analyzing such systems for safety properties, see for instance [17, 12, 11]. 

One of the most basic results in the theory of linear systems, both discrete- 
time and continuous-time, is the characterization of the stability of linear sys- 
tems in terms of the eigenvalues of the corresponding matrix. In this paper, we 
are interested in termination of simple while loop programs, such as the one de- 
scribed above, but with nontrivial loop guards. We present results that relate the 
termination of such linear programs to eigenvalues of the corresponding matrix, 
analogous to the stability characterization in control theory. Our characterization 
also yields decidability of the termination problem for such programs. Although 
linear programs are similar to discrete-time linear systems, the termination char- 
acterization of linear programs is more complex than, though reminiscent of, the 
stability characterization for both continuous- and discrete-time linear systems. 

* Research supported in part by the National Science Foundation under grant CCR- 
0311348 and CCR-0326540. 
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Linear loop programs, as studied in this paper, are specialized piecewise affine 
systems, which themselves are special kinds of nonlinear systems. While several 
properties, such as reachability, stability, and controllability, are decidable for 
linear systems [10, 16], they soon become undecidable even when a “little” non- 
linearity is introduced [16]. In particular, this is also true for piecewise affine 
systems [3,4], see also Section 6. In this context, it is interesting to note that 
termination is decidable for linear loop programs. 

Techniques to prove termination of programs have attracted renewed atten- 
tion lately [7, 5, 6, 13]. The popular approach to prove termination is through the 
synthesis of a ranking function, a mapping from the state space to a well-founded 
domain, whose value monotonically decreases as the system moves forward. This 
line of research has focused mostly on generating linear ranking functions - some 
effective heuristics have been proposed [5, 6] and recently a complete method was 
presented in [13] for a model motivated by [14]. This paper investigates termina- 
tion at a more basic theoretical level. The main result establishes the decidability 
of the termination problem for programs of the form (in matrix notation) 
while {Bx > b) { x := Ax -I- c } 

where Bx > b represents a conjunction of linear inequalities over the state vari- 
ables X and X := Ax+c represents the linear assignments to each of the variables. 
The variables are interpreted over the reals 3? and hence the state space is 3?". 
This class of programs is simpler than the ones considered in [5, 6, 13]. Although 
a program may not be presented in this form, termination questions can often 
be reduced to this basic form after suitable simplifications and transformations. 

We approach the termination issue of the above program as follows. We first 
consider the homogeneous version, 
while (Bx > 0) { x := Ax }, 

and note that the condition Bx > 0 defines a region smaller than a half space 
of 3?". Now if a state x = c is mapped by A (in one or more iterations) to 
something on the other side of the half space, then the program will terminate 
on this state (since the loop condition will become false). In particular, this 
means that the program always terminates on states specified by eigenvectors c 
corresponding to negative real eigenvalue and complex eigenvalues. The former 
ones are mapped by A to their negative image, while the latter ones are rotated 
gradually until they reach the other half space (where Bx > 0 is false). Thus, 
our first result is that, for purposes of termination, the eigenspace corresponding 
to only the positive real eigenvalues of A is relevant (Section 2 and Section 3) . 

In the case when all eigenvalues are positive, the eigenvectors corresponding 
to larger eigenvalues dominate the behavior of the program, that is, after suffi- 
ciently many iterations, the values of the state variables will be governed almost 
solely by the influence of the largest eigenvalue. Based on this, we can guess a 
witness to nontermination and test if the guess is correct by checking satisfiabil- 
ity of a set of constraints (Section 4). Finally, we show that the nonhomogeneous 
case can be reduced to the homogeneous case (Section 5). 
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1.1 Notation 

We use standard mathematical notation for representing vectors and matrices. 
We follow the convention that upper case letters denote integer constants 

and lower case letters i,j, . . . denote indices ranging over integers. In particular, 
a (fV X 1) column matrix is called a vector, and it is denoted by c,d whenever 
the components of the vector are known constants; and hy x,y whenever the 
components of the vector are all variables. A {N x N)-matrix with constant 
entries at the (i, j)-position is denoted hy A = (uy). A diagonal matrix 
A = (ttij) = diag(\i , . . . , Xn) has an = Xi and Oy = 0 otherwise. The transpose 
of a matrix A = {atj) is a matrix B = (bij) such that bij = aji, and it is 
denoted by . Note that the transpose of a column vector c is a row vector 
c^. Using juxtaposition for matrix multiplication, we note that c^d denotes the 
inner product, of the vectors c and d. 

We will also denote matrices by specifying the submatrices inside it. So, for 
instance, diag{Ji, . . . , Jk) would denote a matrix which has matrices Ji, . . . , Jk 
on its “diagonal” and 0 elsewhere. If A is a {N x N)-matrix and c is a vector 
such that Ac = Ac, then c is called an eigenvector of A corresponding to the 
eigenvalue X. The effect of repeated linear assignments {x := Ax) becomes much 
more explicit when we do a change of variables and let the new variables y be 
the eigenvectors of A. In particular, we get transformed assignments of the form 
y := Xy. If there are N linearly independent eigenvectors, then A is said to 
be diagonalizable and the assignments on the new variables will be of the form 
y := diag{Xi, . . . , XN)y- But this is not possible always. However, instead of 
a diagonal matrix, we can always get an almost diagonal, the so-called Jordan 
form, matrix [9]. 

2 The Homogeneous Case 

The presentation in this paper is incremental - going from syntactically simple 
to more complex programs. In this section, we consider linear programs of the 
following form: 

PI: while {c^x>0) { x := Ax }. 

The variables in x are interpreted over the set 3? of reals. The assignment x := 
Ax is interpreted as being done simultaneously and not in any sequential order. A 
list of sequential assignments can be modified and presented in the form x := Ax, 
see Example 1. We say that the Program PI terminates if it terminates on all 
initial values in 3? for the variables in x. 

Theorem 1. If the linear loop program PI, defined by an {N x N) -matrix A and 
a nonzero N x 1-vector c, is nonterminating then there exists a real eigenvector 
V of A, corresponding to positive eigenvalue, such that > 0. 

Proof. (Sketch) Suppose the linear loop program is nonterminating. Define the 
set NT of all points on which the program does not terminate. 

NT = {x G 3?^ : <Ax > 0, Ax > 0, (A A?x > 0, . . . , (A A^x > 0, . . .}. 
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By assumption, NT ^ 0. The set NT is also 7l-invariant, that is, if u G NT, 
then Av G NT. Note that NT is an affine subspace of 3?-^, that is, it is closed 
under addition and scalar multiplication by positive reals. Hence, it is convex^. 
Define T = 3?^ — iVT to be the set of all points where the program terminates. 
Define the boundary, dNT, of NT and T as the set of all v such that (for all e) 
there exists a point in the e-neighborhood of v that belongs to T and another 
that belongs to NT. 

Let NT' be the completion of NT, that is, NT' = NT U dNT. Since NT is 
H-invariant, it means that A maps NT into NT. By continuity we have that A 
also maps NT' into NT'. Now, NT' is convex, and if we identify points x and 
y, written as x ^ y, that are nonzero scalar multiples of each other (x = Ay), 
then the resulting set {NT'/ ~) is closed and bounded (as a subset of 3?"“^). By 
Brouwer’s fixed point theorem [15], it follows that there is an eigenvector v (with 
positive eigenvalue) of A in NT'. For all points u G NT, we know c^u > 0. By 
continuity, for all points u G NT', we have c^u > 0. ■ 

If, in fact, it is the case that c^v > 0, then u is a witness to nontermina- 
tion of the loop. Thus, Theorem 1 can be used to get the following conditional 
characterization of nontermination. 

Corollary 1. If there is no real eigenvector v of A such that c^v = 0, then 
the linear loop program defined by A and c is nonterminating iff there exists an 
eigenvector v on which the loop is nonterminating. 

Example 1. The effect of two sequential assignments x := x — y; y := x + 2y is 
captured by the simultaneous assignment 




The matrix A has no real eigenvalues. Let c yf 0 be any (nonzero) vector. The 
condition of Corollary 1 is trivially satisfied. And since there is no real eigenvalue, 
we immediately conclude that the linear loop program specified by A and (any 
nonzero vector) c is terminating. 

Example 2. Let 0 be a fixed number. Consider the following program: while 
{z — y>0) { X := Ax }, where A = [cos 0, — sin 0, 0; sin 0, cos 0, 0; 0, 0, 1]. Thus, 
A simply rotates the 3-D space by an angle 6 about the z-axis. The set of points 
where this program is nonterminating is NT = {{x,y, z) : z > a; sin (f + y cos </> : 
4> = nO, n = 0, 1, 2, . . .}. For 9 that is not a factor of tt, dNT = {{x, y, z) : z"^ = 
x^ + J/^} (eliminate </> from above). Note that there is the eigenvector (0,0, 1) in 
NT corresponding to positive eigenvalue 1. As another example of the boundary, 
note that if 6* = 7 t/ 4, then dNT contains 8 hyperplanes, each one is mapped by 
A to the next adjacent one. 

2.1 Generalizing the Loop Condition 

The loop condition can be generalized to allow for a conjunction of multiple 
linear inequalities. We continue to assume that all linear inequalities and linear 

A set NT is convex if au + (1 — a)v G NT whenever u,v £ NT and 0 < a < 1. 
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assignments consist only of homogeneous expressions. Let i? be a (M x A^)-matrix 
(with rational entries) and A be a {N x A^)-matrix. We consider programs of the 
following form: 

P2: while {Bx > 0) { x := Ax } . 

Theorem 1 and Corollary 1 immediately generalize to programs of the form P2. 

Theorem 2. If Program P2, specified by matrices A and B, is nonterminating, 
then there is a real eigenvector v of A, corresponding to a positive real eigenvalue, 
such that Bv > 0. 

Corollary 2. Assume that for every real eigenvector v of A, corresponding to a 
positive eigenvalue, whenever Bv > 0, then it is actually the case that Bv > 0. 
Then, the Program P2, defined by A and B, is nonterminating iff there exists 
an eigenvector v on which the loop is nonterminating. 

Example 3. Consider the program: 

while {x — y > 0) { x := —x + y, y '■= y } . 

The matrix A = [—1,1; 0,1] has two eigenvalues, 1 and —1. The eigenvector 
corresponding to the eigenvalue 1 is [1; 2] and we note that l — Hence, it 

follows from Corollary 2 that the above loop is terminating. 



2.2 Two Variable Case 

Theorem 1 and Theorem 2 show that nonterminating linear loops almost always 
have a witness that is an eigenvector of the matrix A. The only problematic 
case is when the eigenvector is on the boundary, dNT, so that it is not clear if 
indeed there are points where the program is nonterminating. However, in the 
2-dimensional case, that is, when there are only two variables, the region NT will 
be a sector and it can be specified by its two boundary rays. Thus, if NT yf 0, 
then there exists an H-invariant sector, given by a^x [> 0 A b^x >0 A a: yf 0 
where [> € {>,>}, on which the loop condition always evaluates to true. This 
can be expressed as a quantified formula over the theory of (linear) arithmetic 
interpreted over the reals, which is a decidable theory. 

Theorem 3. A two variable linear loop program, 
while (Bx >0) { x := Ax }, 

is non-terminating iff the following sentence in true in the theory of reals 

3a,b.[3x.(j>{a,b,x) A yx.{<j){a,b,x) {Bx > 0 A (p{a,b, Ax)))] 

where 4>{a, b, x) denotes oAx [> 0 A b^ x >0 A a; yf 0 and \> G {>, >}. 

This theorem gives a decision procedure for termination of two variable loops 
since the formula in Theorem 3 can be tested for satisfiability. Theorem 3 can- 
not be generalized to higher dimensions since there may not be finitely many 
hyperplane boundaries, as Example 2 illustrates. 
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3 Reducing the Homogeneous Case 



Corollary 2 falls short of yielding decidability of termination of homogeneous 
linear programs. But it hints that the real eigenvalues and the corresponding 
eigenvectors are relevant for termination characteristics of such programs. In 
this section, we will formally show that the nonpositive eigenvalues (and the 
corresponding eigenspace) can be ignored and the termination problem can be 
reduced to only the eigenspace corresponding to positive real eigenvalues of the 
matrix A. 

We first note that the Program P2 from Section 2.1 can be transformed by 
an invertible (bijective) transformation, preserving its termination properties. 

Proposition 1. Let P be an invertible linear transformation. The program 
P2: while {Bx > 0) { x := Ax } 
is terminating iff the program 

P3: while (BPy>0) { y := P~^APy } 
is terminating. 

Proof. If Program P2 does not terminate on input x := c, then Program P3 will 
not terminate on input y := P~^c. Conversely, if Program P3 does not terminate 
on input y := d, then Program P2 will not terminate on input x := Pd. ■ 
Thus, Proposition 1 is just about doing a “change of variables”. It is a well 
known result in linear algebra [9, 1] that using a suitable change of variables, a 
real matrix A can be transformed into the form, diag{Ji, J 2 , ... , Jk), called the 
real Jordan form, where each Ji is either of the two forms: 
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where G 3? is a real whereas Di is a (2 x 2 )-matrix of the form 



-A 



For 



uniformity, the second Jordan block will denote both the forms. When it denotes 
the first form, then Di and I are both (lx 1 (-matrices and we will say Hi G 3? 
and treat it as a real. We define |Hi| = |Ai| in the first case and |Hi| = -P j3f 
in the second case. 

Let P be the real {N x fV)-matrix such that P~^AP = diag{J\, . . . , Jk)- 
Thus, Program P2, specified by matrices A and B, 

P2: while {Bx > 0) { x := Ax }, 
can be transformed into the new Program P3, 

P3: while {BPy > 0) { y \= diag{Ji, . . . , Jx)y }• 

Proposition 1 means that we can focus on termination of Program P3. Partition 
the variables in y into j/i, j /27 • • • , Vk and rewrite the Program P3 as 

P3: while (HiyiH > 0) { yi := Jiyr, . . . ;yK ■= JkVk }. 

where Bfs are obtained by partitioning the matrix BP. Let S = {1,2,..., K} be 
the set of indices. Define the set = {i G S : Di G iR, Di > 0} . The following 
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technical lemma shows that we can ignore the state space corresponding to 
negative and complex eigenvalues, while still preserving the termination behavior 
of the Program P3. 

Lemma 1. The Program PS, as defined above, is terminating iff the program 
P4: while (J2jes+ > 0) { yj := JjVj; for j G S+ } 
is terminating. 

Proof. (Sketch) If the Program P4 does not terminate on input yj := Cj, where 
j G S+, then the Program P3 does not terminate on input yj := Cj for 
and yj 0 for j ^ S+. 

For the converse, assume that Program P3 does not terminate on input yj := 
Cj, j G S. Consider the m-th loop condition, where Bjm 

denotes the m-th row of Bj. Assume that yj has Nj components, yj^,yj^, . . . , 
yjN — 1 ’ "''^here each yj^^ is either a2xloralxl matrix (depending on whether 
is 2 X 2 or 1 X 1.) The value of yj, at the t-th iteration, is given by 
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Define fm{i) to be the value of the expression of the m-th condition at 



i-th iteration, that is, fm{i) = X 
Nj = max{Nj : \Dj\ = max{\Di\ : i G 



jeS 



that for large i, the term 



Nt-1 



yj{i). Let J be an index such that 
S'}}. We claim, without further proof, 

D/~^-’'^^yj> ,-i(0) will dominate the 



value of fm{i)- If S ^ S+, then the sign of this dominating term, and conse- 
quently the sign of fm{i), will fluctuate (between positive and negative) as i 
increases. By assumption, this does not happen. Hence, J G S+ and hence, for a 
large enough i, say i > Im, we have Xj6S+ ^jmyj{i) > 0. For each condition 
m, we get an index 1^. Set I to be the maximum of all Im’s. For t > /, it is the 
case that Xjes+ ^jmyj{i) > 0 for all m. 

Define new initial conditions for Program P3 and Program P4 as follows: 
yffO) := yffl) for all i G S+ and yffff) := 0 for all i ^ S+. Program P3 does 
not terminate on this new initial conditions. Hence, Program P4 also does not 
terminate on it. This completes the proof. ■ 



Example 4- We borrow the following example from [13], 

Q1 : while (x > 0 A y > 0) { x := —2x + lOj/; y := y } . 

The matrix A has two eigenvalues —2 and 1, and it is clearly diagonalizable. In 
fact, consider the transformation matrix P, 



A = 





P-^AP = 



-2 0 
0 1 



BP = P 
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Transforming Program Q1 by P, we get 

Q2: while + 10x2 > 0 A 3x2 > 0) { xi := — 2xi; X 2 := X 2 }. 

Lemma 1 says that the termination of Program Q1 and Program Q2 can be 
decided by just considering the termination characteristics of: 

Q3: while (10x2 > 0 A 3x2 >0) { X 2 := X 2 }. 

In fact, the point Xi = 0, X 2 = 1 makes Program Q3 nonterminating, and corre- 
spondingly, the point x = 10, y = 3 makes Program Q1 nonterminating. 

4 All Positive Eigenvalues 

Lemma 1 reduces the termination of Program P2 of Section 2.1 to testing ter- 
mination of Program P4, which is given as: 

P4; while (Piyi H h > 0) { yi := Jiyi, . ■ . ;yr ■= JrUr } 

where each of the Jordan blocks Ji corresponds to a positive real eigenvalue A^. 

The value of variables in yj, after the t-th iteration, are given by Equation 1, 
where Dj = Xj is a positive real now. As before, assume that the /c-th loop 
condition is written as Bikyi + i? 2 fcl /2 -k • • • -I- Brkyr > 0. We can express the 
requirement that the fc-th loop condition be true after the z-th iteration as 



B\kyi{i) + B 2 ky 2 {i) -k • • • -k Brkyr{i) > 0. 

Expand this using Equation 1 and let Ckij denote the result of collecting all 
coefficients of the term („ 1 i)A; Now, the /c-th loop condition after z-th 

iteration can be written as 

AiC'feiiy(O) -k zAj^ ^C'fei22/(0) + • • • + ^ ^ A]^ *• ^ ^C'feiniy(O) + 

. . . -k 

A;Cfc,iy(0) + zA;-iCfc,2y(0)-k---+ J A(,-("’-i)Cfc™,2/(0) > 0, 

which we will denote by Condk{y{i))- If two eigenvalues A; and Am are the same, 
then we assume that the corresponding coefficients (of (j)Ap^ and (j)A(;7^) 
have been merged in the above expression, so that without loss of generality, we 
can assume that each A; is distinct and such that 0 < Ai < A 2 < • • • < A^ . 

Define the set Ind = {11, 12, ... , Im, 21, 22, ... , 2zz2, . . . , rl, r2, . . . , rtir} and 
an ordering A on this set so that elements on the right are greater-than elements 
on the left in the above set. The idea here is that nonzero terms in Condk that 
have larger indices grow faster asymptotically (as z increases). 

We decide the termination of Program P4 using the following three step 
nondeterministic algorithm: 

(1) For each of the m loop conditions, we guess an element from the set Ind. 
Formally, we guess a mapping index : {1,2,..., m} i-^- Ind. Intuitively if y(0) is 
a witness to nontermination, then index{k) is chosen so that Ck^index(k)y{^) > 0 
(and this is the dominant summand) and Ck,indy{0) = 0 for all ind >~ index {k). 
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(2) Build a set of linear equality and inequality constraints as follows: from the 
/c-th loop condition, generate the following constraints, 

Ck,indZ = 0, if ind >- index {k) 

Ck,ind,z >0, if ind = index {k) 

Condk{z{i)) >0, if 0 < t < Il 2 {index{k)) 

where 77i and II 2 denote the projection onto the first and second components 
respectively. Note that the unknowns are just 2 : (the initial values for y). 

(3) Return “nonterminating” if the new set of linear inequalities and linear equa- 
tions is satisfiable (in 3?”), return “terminating” otherwise. 

We state the correctness of the algorithm, without proof, in Lemma 2 and 
follow it up by a summary of the complete decision procedure in the proof of 
Theorem 4. 

Lemma 2. The nondeterministic proeedure returns “nonterminating” iff the 
Program P4 

P4: while (Riyi H \- BrVr > 0) { Vi ■= JiVT, ■ ■ ;yr ■= JrVr } 

is nonterminating. 

Theorem 4. The termination of a homogeneous linear program of the form 
P2: while {Bx > 0) { x := Ax } 
is decidable. 

Proof. We decide termination of the Program P2 as follows: If A has no positive 
real eigenvalues, then return “terminating” (Corollary 2). If every real eigenvec- 
tor V corresponding to a positive real eigenvalue of A satisfies Bv < 0, then 
return “terminating” (Theorem 2). If there is a real eigenvector v corresponding 
to a positive real eigenvalue of A such that Bv > 0, then return “nonterminat- 
ing” (Corollary 2). If none of the above cases is true, then clearly A has positive 
real eigenvalues. Compute the Jordan blocks and the generalized eigenvectors 
only corresponding to the positive real eigenvalues of A. Generate a transforma- 
tion P by extending the computed set of generalized eigenvectors with any set 
of vectors in space orthogonal to that of the generated eigenvectors. Transform 
Program P2 by P as in Proposition 1. It is an easy exercise^ to note that we can 
apply Lemma 1 and reduce the termination problem to that for Program P4 of 
Lemma 1. Finally, we decide termination of Program P4 using the nondetermin- 
istic procedure of Section 4 (Lemma 2). ■ 

Example 5. Consider the program 

Q4: while (cc > 0 A y > 0) { x := x - y; y := y } 

This contains only two variables, and hence we can use Theorem 3, but for 
purposes of illustration, we apply Theorem 4 here. The matrix A = [1, —1; 0, 1] 
has a positive real eigenvalue 1. The vector given by a: = l,y = 0 is an eigen- 
vector corresponding to this eigenvalue. Hence, we cannot apply Theorem 2 or 

^ We cannot apply Lemma 1 directly since we did not compute the real Jordan form 
of the “full” matrix A, but only of a part of it. But we do not need to compute the 
full real Jordan form to get Program P4. 
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Corollary 2. The real Jordan form of A\s A' = [1, 1; 0, 1] and the corresponding 
transformation matrix is P = [—1, 0; 0, 1]. The new program is: 

Q5: while {—x >0Ay>0) {x :=x + y; y :=y} 

The general solutions are given by 

x{i) = l*a;(0) + tj/(0) = x(0) + tj/(0) 
y{i) = Py(0) = y(0) 

The condition Condi corresponding to the loop condition —x > 0 is — a;(0) — 
fy(0) > 0 and similarly Cond 2 is j/(0) > 0. There is no choice for index{2), but 
there is a choice for index {!): it can be either 11 or 12. 

In the first case, we generate the following constraints from the first loop 
condition: {—y = 0, —x > 0, —x — 0y> 0}. From the second loop condition, we 
only generate the constraint y > 0. Together, we detect an inconsistency. 

In the second case, we generate the following constraints from the first loop 
condition: {—y > 0, —x — Oy > 0, —x — ly > 0}. Again, from the second loop 
condition, we generate the constraint y > 0, which is inconsistent with the above 
constraints. Hence, we conclude that the Program Q4 is terminating. 



5 Nonhomogeneous Programs 

Now we consider linear programs of the following form: 

P5: while {Bx > b) { x := Ax + c } 

We can homogenize this program and add an additional constraint on the ho- 
mogenizing variable to get the following 

P6: while {Bx — bz > 0 A z > 0) { x := Ax + cz; z := z } 
where z is a new variable. The homogeneous program reduces to the original 
program if we substitute 1 for z. 

Proposition 2. The nonhomogeneous Program P5 does not terminate iff the 
homogeneous Program P6 does not terminate. 

Proof. If Program P5 does not terminate, say on input Xi = di,i = 1,2, ... ,n, 
then Program P6 does not terminate on input z = l,Xi = di,i = 1,2, ... ,n. 

For the converse, assume that Program P6 does not terminate on input xq = 
do, Xi = di,i = 1,2, ... ,n. If do > 0, then we can scale this input to get a new 
state xo = l,Xi = di/do, i = 1,2, . . .,n. The behavior of Program P6 will be the 
same on the scaled input, and hence Program P5 would not terminate on this 
input either. ■ 

Thus we can reduce the decidability of termination of nonhomogeneous pro- 
grams to that of homogeneous programs. Together with Theorem 4, we get the 
following result. 

Theorem 5. The termination of a nonhomogeneous linear program of the form 
P5 : while {Bx > b) { x := Ax -P c } 
is decidable. 
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Remarks on Computability and Complexity. The three step nondeter- 
ministic algorithm described in Section 4 is clearly in class NP. However, the 
nondeterminism can be eliminated by a careful enumeration of choices. The idea 
is that we always start with the guess index {k) = rur for each loop condition 
k and if the resulting formula is unsatisfiable, then we readjust the guess and 
gradually set index{k) to smaller elements in the set Ind. We do not formalize 
this detail in this paper because it is not central to the decidability result. 

The reduction described in Section 3 requires computation of a real eigen- 
vector. Computing with real numbers is not easy, but if the input matrices are 
over the rationals, then all the real numbers that arise in the computation are 
algebraic. Computing with algebraic numbers is theoretically possible, but it 
can be expensive (since the theory of real-closed fields has a double exponential 
lower bound). But this should not be a serious problem for two reasons. First, 
for most problems arising in practice, we expect the eigenvalues to be rational, 
and computation with rationals can be done very efficiently. Second, even if the 
eigenvalues are irrational, the dimension of the problem is unlikely to be so high 
that computing with algebraic numbers will become a bottleneck. However, these 
issues have to be experimented with and that is left as future work. 

We believe that the Jordan form computation step in the decision procedure 
outlined in this paper can be eliminated in most cases in practice. This can be 
achieved by perturbing the system by a little (for example, replacing conditions 
c > 0 by c > e for some small constant e) and studying the termination property 
of the perfumed system. We conjecture that Corollary 2 can be strengthened for 
the case when the homogenization variable violates the condition. This would 
allow us to avoid the Jordan decomposition step in many cases. 

The decision procedure for termination of linear loop programs can be 
adapted to the case when the variables are interpreted over the integers and 
rationals. If there are “a lot” of witnesses to nontermination, then there will 
be a rational witness too, since the rationals are dense in reals. If not, then we 
can detect this case using a specialized wrapper. This leads us to conjecture the 
following. 

Conjecture 1. The termination of Program P5, as defined in Theorem 5, when 
all variables are interpreted over the set of integers, is decidable. 

6 The General Case 

We consider the termination of a set of nondeterministic linear conditional as- 
signments, written in a guarded command language [8] as, 

[ Bix > 6i — > X := Aix + Ci 

0 B2X > 62 — > X := A2X + C2 

[] BkX > bk — > X := AkX + cp, ] 

which we will write in shorthand as 

P7: ni=i{B^x>bi — > x:=AiX + Ci) 
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Counter machines can be naturally encoded as Program P7. We introduce one 
variable Xi for each counter and one variable x for the finite control (program 
counter). Encodings of conditional branches, counter increments, and counter 
decrements are straightforward. The problem of deciding if a counter machine 
halts on all inputs is undecidable, see [3], where this problem is called the mor- 
tality problem. Therefore, the problem of deciding if a program of the above 
form halts on all integer inputs is also undecidable. Note however that for the 
restricted forms of assignments (x := a;± 1) generated by the translation, termi- 
nation over reals is equivalent to termination over integers. Thus, we conclude 
that the termination problem for Program P7 is undecidable. 

Theorem 5 can be used to get an incomplete test for nontermination for Pro- 
gram P7 ~ If Program P7 is terminating, then for each f, the program while 
{BiX > bi) { X := AiX + Ci } is terminating. The converse is also true un- 
der additional commutation properties [2] amongst the k binary relations, say 
Ri, . . . , Rk, induced by the k guarded commands. In particular, one immediate 
consequence of a result in [2] is the following. 

Proposition 3. Let Program P7 and relations R\, . . . , Rk be as defined above. 
Let R = i?i U • • • U . Assume that whenever i < j, it is the case that Rj o R^ C 
RiO R* . Then, the Program P7 terminates if and only if each Ri do. 

Note that the condition above is dependent on the order Ri, . . . , Rk, which 
we are free to choose. Testing for the quasi-commutation property [2] Rj o Ri C 
Ri o R* is possible if we restrict the search to Rj o Ri C RioR\ for some finite 
1. In the special case when the z-th guarded command cannot be enabled after 
execution of the j-th guarded command (that is, Rj o Ri = 0), then the above 
inclusion is trivially true. 

In the case when the quasi-commutation property cannot be established, 
the test for nontermination can be made “more complete” by including new 
guarded transitions obtained by composing two or more of the original guarded 
commands. It is easy to see that the composition results in a linear guarded 
command. These new guarded commands can be tested for nontermination using 
Theorem 5 again. 

7 Future Work and Conclusion 

We have presented decidability results for termination of simple loop programs. 
The loops are considered terminating if they terminate on all initial real val- 
ues of the variables. The decision procedure is based on the observation that 
only the eigenvectors (and the generalized eigenspace) corresponding to posi- 
tive real eigenvalues of the assignment matrix are relevant for termination. The 
generalization to multiple linear nondeterministic guarded commands makes the 
problem undecidable. Under certain restrictive commutation conditions, termi- 
nation of multiple linear guarded commands can be reduced to termination of 
each individual simple linear loops. 

We believe that results and tools from systems theory, such as Lyapunov 
functions and control Lyapunov functions, can yield powerful tools for analyz- 
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ing software, especially for termination analysis and invariant generation. This 
avenue should be explored further in the future. 
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Abstract. This paper presents a symbolic model checking algorithm for 
Fixpoint Logic with Chop, an extension of the modal /r-calculus capable 
of dehning non-regular properties. Some empirical data about running 
times of a naive implementation of this algorithm are given as well. 



1 Introduction 

Automatic hardware verification has proved to be one of the most fruitful areas 
in computer science of the last two decades. Since the introduction of Linear 
Time Temporal Logic, LTL, to computer science [Pnu77], it is known that the 
problem of automatically verifying correctness of a program can be reduced to 
the logical problem called model checking: given a structure 21 and a formula (p, 
does 2t have the property described by p. 

Since the structures that computer scientists are interested in are often sets 
of states with transitions between them that describe a program’s behaviour in 
time, the logics used for verification are mostly modal and temporal ones. A 
lot has been argued about which is the right logic to use; whether it should be 
a linear or a branching time logic [Sti89, Var98]; whether it should rather be 
expressive or admit efficient model checking, etc. 

Regardless of which of these logics one favours, in almost all cases it will turn 
out to be a fragment of or embeddable into the modal /x-calculus C^, [Koz83], 
which extends basic multi-modal logic with extremal greatest and least fixpoints. 
In fact, incorporates the well-known temporal logics LTL, CTL [EH85], CTL* 
[EH86], as well as other logics which have gained more attention outside of 
computer science: PDL [FL79], Description Logics [WS92], etc. 

has a commonly accepted weakness: its formulas cannot in general be 
called readable. Even though might therefore not be of great use to someone 
who wants to do program verification but is unfamiliar with modal logics, it 
is nevertheless important to have efficient decision procedures for such a logic. 
They can automatically yield efficient decision procedures for more readable 
logics as well. Model checking algorithms for (fragments of) the /x-calculus have 
been presented in [Cle90, SW91, BC96, SS98] for example. Even though most 
of these algorithms are local - they do not necessarily need to exploit the entire 
underlying structure - they are all based around explicit representations of its 
state space. 



R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 83—95, 2004. 
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The state-of-the-art in automatic verification however is symbolic model 
checking [BCM+92] which avoids explicit representations of states. The struc- 
tures to be checked for correctness are encoded as boolean functions which can 
space-efficiently be represented using Reduced Ordered Binary Decision Dia- 
grams [Bry86], or BDDs for short. What makes them so suitable and popular 
for program verification is the fact that all the operations needed to do model 
checking for the modal ^-calculus - hence all the operations needed to do model 
checking for a lot of other logics - can easily be carried out on BDDs. 

Many people would argue that full is more than anyone would ever want 
because of two reasons: most people’s favourite temporal logic is embeddable into 
a rather small fragment of and anything in with an alternation depth 
greater than 2 is incomprehensible anyway. However, it is not only alternation 
depth that can limit expressive power. 

Formulas of C,, can be translated into finite automata over infinite trees 
[SE84]. This shows that properties definable in the modal ^-calculus cannot be 
everything anyone would ever want. Every formula of expresses a “regular” 
property. However, lots of properties of programs one might want to check are 
inherently non-regular: something behaves like a stack, action sequences get 
repeated, something happens at the same time in all execution paths, etc. 

In [M099], Miiller-Olm has introduced Fixpoint Logic with Chop, FLC for 
short, which extends with a sequential composition operator. This extension 
yields a far greater expressive power. FLC can define certain properties which 
are not even context-free. [Lan02a] gives an idea of what properties actually are 
definable in FLC. 

Of course, there are two sides to a coin: extending to FLC does not 
retain the finite model property nor decidability of its satisfiability problem. 
However, it maintains attractiveness for automatic verification purposes since its 
model checking problem on finite models is decidable, yet PSPACE-hard. Model 
Checking algorithms for FLC based on explicit state space representations were 
given in [LS02, Lan02b]. 

Here we show that despite the PSPACE-hardness, model checking for FLC 
is feasible - even on rather large transition systems - when symbolic represen- 
tations are used. Section 2 defines syntax and semantics of FLC, and recalls the 
definition of BDDs and how to symbolically encode labelled transition systems. 
Section 3 presents a symbolic model checking algorithm for FLC and proves its 
correctness. The algorithm is based on the insight that not only all the opera- 
tions needed for model checking can easily be carried out on BDDs but even 
those needed for FLC model checking. Finally, Section 4 supports the claim that 
FLC model checking is feasible. It describes a quick and naive symbolic imple- 
mentation of a family of transition systems and the model checking algorithm. 
Furthermore, it contains some empirical data about running times of the algo- 
rithm on some FLC formulas. One of those is not equivalent to any /1^-formula. 

We omit a complexity analysis of the algorithm mainly because it is known 
that BDD-based model checking can defeat intractability results. Moreover, it 
is the experimental results which yield a better answer to the questions of feasi- 
bility. 
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2 Preliminaries 

2.1 Fixpoint Logic with Chop 

Let V = {tt, ff, g, g, . . .} be a set of propositional constants that is closed under 
complementation, V = {Z,Y,...} a set of propositional variables, and A = 
{a, 6,...} a set of action names. A labelled transition system is a graph T = 
(5, {AL- 1 a G A},L) where 5 is a set of states, for each a G A is a binary 
relation on states and L : S ^ 2^ labels the states such that, for all s G 5 : g G 
L{s) iff g ^ A(s), tt G L{s), and ff ^ L{s). We will use infix notation s t for 
transition relations. 

Formulas of FLC are given by the following grammar. 

ip ::= q \ Z \ T \ {a) \ [a] \ ipy p \ p ^p \ pZ.p \ vZ.p \ p;p 

where q G V, Z G V, and a G A We will write a for p or v. To save brackets 
we introduce the convention that ; binds stronger than A which binds stronger 
than V. To keep a strong resemblance to the syntax of the modal /r-calculus we 
will sometimes write {a)p or [a]p instead of (a); p, resp. [a]; p. 

Formulas are assumed to be well-named in the sense that each binder variable 
is distinct. Our main interest is with formulas that do not have free variables, 
in which case there is a function fp-.V^ FLC that maps each variable to its 
defining fixpoint formula (that may contain free variables). 

The set Sub{p) of subformulas of p is defined as usual, with Sub{aZ.ip) = 
{aZ.ip} U Sub(pp). 

An environment p : V — > (2*^ ^ 2^) maps variables to monotone functions of 
sets to sets. p[Z f] is the function that maps Z to f and agrees with p on all 
other arguments. The semantics |-]^ : 2“^ ^ 2“^ of an FLC formula, relative to T 
and p, is a monotone function on subsets of states with respect to the inclusion 
ordering on 2“^. These functions together with the partial order given by 

/ E P iff VA C 5 : /(A) C g(A) 

form a complete lattice with joins U and meets □. By the Tarski-Knaster The- 
orem [Tar55] the least and greatest fixpoints of functionals F : (2“^ ^ 2“^) ^ 
(2“^ ^ 2“^) exist. They are used to interpret fixpoint formulas of FLC. 

To simplify the notation we assume a transition system T to be fixed, and 
write |-]^ instead of |-] J for the semantics given in Figure 1. 

A state s satisfies a formula p under p, written s \=p p, iff s G |(p](5). If p 
is a closed formula then p can be omitted and we write |p] (5) as well as s ^ p. 

Two formulas p and ip are equivalent, written p = t/’, iff their semantics are 
the same, i.e. for every T and every p: |p] J = |V’l J- 

We introduce approximants of fixpoint formulas. Let fp{Z) = pZ.p for some 
p and let a, A G Ord, the ordinals, where A is a limit ordinal. Then 

:= ff , = p[^“/Z] , Z^=\J Z°^ 

a<A 



^ In [M099], r is called term. 
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Mp £ *5 I 5 G L{s)} 

[^Ip := P(Z) 

[r]^ := AXX 
{ifi V i/’lp := Mp U 
{ip A 'i/’lp := Mp n Mp 

[(a)]^ := AX.{s G 5 I 3t G X, s.t. s^t} 

IHl^ — AX.{s G 5 I Vt G 5,s^t G X} 

[pZ.ip]^ := □{/ : 2-® ^ 2-® I / monotone, C /} 

:= □{/ :2^ ^2^ \ f monotone, / C [pjp^z^f]} 

b;V’]p := Mp°Mp 



Fig. 1. The semantics of FLC formulas. 



If fp{Z) = i/Z.ip then 

:= tt , = ip\Z°^lZ\ , Z^= f\Z°^ 

a<A 

Note that [iZ.ip = VaeOrd vZ.ip = AaeOrd finite transition 

systems are considered Ord can be replaced by N. 

We will give a few examples of FLC formulas to make the reader familiar with 
the logic and to show how one can express certain non-regular properties in it. 
Remember that FLC is an extension of the modal /x-calculus whose formulas are 
commonly perceived as not being the most understandable already. For a better 
understanding of what property actually is expressed by a given FLC formula 
consider [Lan02b], which defines model checking games for FLC, or [Lan02a], 
which maps FLC on linear models into the Chomsky-hierarchy. 

Example 1. The standard example of a non-regular and even non-context-free 
property is the following, taken from [M099]. It describes the language {a"6"c” | 
n G N} on finite, linear models. 

ip := A(a,6);x(c) Ax(a);A(&,c) 

'4’{x,y) := hX.tW {x);X;{y) 

X(x) := yX.TV {x);X 

Note how the modal constructs (•) and [•] can be used behind variables to produce 
genuine context-free effects. 

The best way to understand FLC formulas is to think of them as alter- 
nating, context-free grammars [IJW92]. Variables are non-terminals, formulas 
of the form (a) or [a] are terminals, the boolean operators translate into non- 
deterministic, resp. universal choices in these grammars, and r plays the role 
of the empty word. Finally, least fixpoint variables can only be replaced finitely 
often (recursion) while greatest fixpoint variables can be replaced infinitely often 
(co-recursion). 

Then formula ip{x,y) for example generates words of the form (cc)”; (y)" for 
any n G N. 
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Example 2. [LS02] Let A = {a, b} and 

ip := vY.\b]f± A [a](r^Z.[6] A [a](^; Z)); (([a]ff A V Y) 

Formula ip expresses “the number of bs never exceeds the number of as” which 
is non-regular and, therefore, is not expressible in C^. This is an interesting 
property of protocols when a and b are the actions send and receive. 

The subformula f/' = i^Z.[b] A [a]{Z; Z) expresses “there can be at most one b 
more than there are as” . This can be understood best by unfolding the fixpoint 
formula and thus obtaining sequences of modalities and variables. It is easy to 
see that replacing a Z with a [6] reduces the number of Zs whereas replacing it 
with the other conjunct adds a new Z to the sequence. 

Then, [&]ff A [a]tp postulates that at the beginning no b is possible and for 
every sequence of n actions a there can be at most n actions b. Finally, the Y in 
ip allows such sequences to be composed or finished in a deadlock state. 

2.2 Binary Decision Diagrams 

When we speak about Binary Decision Diagrams or BDDs we mean in fact 
Reduced Ordered Binary Decision Diagrams. These are compact representations 
of boolean functions. For a proper introduction to BDDs see for example [Bry86] 
or [And97]. In [BCM+92] it is shown how to use BDDs for verification purposes 
in general and for a variant of the modal /i-calculus due to Park and the temporal 
logic CTL in particular. Here we will briefly define BDDs and mention some of 
the standard algorithms for manipulating them. 

Definition 1. Let B = {tt,ff} be the well-known boolean algebra of boolean 
values. 

A BDD over a totally ordered set {a;i, . . . , Xn} of boolean variables is a di- 
rected acyclic graph (DAG) t with the following properties. 

— There are two distinguished nodes called tt and ff which are not labelled 
and have out-degree zero. 

— Every other node is labelled with a variable Xi and has out-degree two. The 
successors are ordered as well, i.e. for a node u they are called low{u) and 
high{u). 

— The DAG has a root and on every path from the root to one of the nodes tt 
or ff the node labels respect the given variable ordering. 

— No two nodes have the same label and the same high and low successors. 

— For no node high and low successor are identical. 

Let BDDn denote the set of all BDDs over n variables, presumably called 

Xi , . . . , Xn ■ 

Every BDD t over n variables represents a boolean function /( : B” ^ B. 
The value b of ft under an assignment of the variables xi, . . . ,Xn is given by the 
path in t 
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— that starts at its root, and 

— at every node labelled with Xi branches to the high-successor if the variable 
is assigned the value tt and to the low-successor otherwise, and 

— ends in the node b. 

We will also write t{bi, . . . ,bn) for this value if bi is the value assigned to Xi for 
i = 1, . . . ,n. Note that t{bi, . . . ,bn) = ft{bi, . . . , 6„) in this case. 

The algorithm of the next section will make use of variable substitutions 
in a BDD. With t[yi/xi, . . . ,yn/xn] we denote the BDD that arises from t by 
simultaneously replacing all node labels Xi with yi for f = 1, . . . , n. 

The converse of the previously sketched result is also true. Every boolean 
function can be represented by a BDD. In fact, the following stronger result 
holds. 

Theorem 1. [Bry86] For every function / : B" ^ B and a given variable 
ordering there is a unique BDD tj representing f. 

[Bry86] also shows how to manipulate BDDs. For example, there are (efficiently) 
computable functions 

— makeBDD: (B* ^ B) ^ BDDn. It takes as an argument a boolean function 
/ and returns the BDD representing it. We will only use it for / = tt or 
/ = ff. In these cases the roots of the returned BDDs are the respective 
distinguished nodes tt or ff. 

— applyBDD: (B ^ B ^ B) — > BDDn BDDn BDDn- It takes as 
arguments a binary boolean operator 0 and two BDDs over n variables 
and returns another BDD over these n variables. Assuming the two input 
arguments represent the boolean functions / and g, applyBDD yields the 
unique BDD representing the boolean function f (B g- 

— existsBDD: V* ^ BDDn — *■ BDDn- It takes as arguments a list of variables 
{xi-^, - - . ,Xi^} C {xi, - . - ,Xn}, and a BDD t representing a function ft- It 

returns the BDD representing the function /' = dxq 3xi^-ft, which 

has as free variables at most those not in {a;q , . . . , Xi^}- Note that 3x-f = 
f[0/x] V f[l/x]- 

— forallBDD: V* ^ BDDn BDDn- Similar to the previous one. However, 

this one returns a BDD representing the function /' = Vxq -ft 

where Va;./ = /[0/a;] A /[1/x]. 

2.3 Encoding a Labelled Transition System 

For the remainder of the paper we assume a T = (5, a S A},L) to 

be fixed. This is not a restriction but simplifies notation since the number of 
variables used in the algorithm presented in the next section depends on the size 
of the transition system at hand. 

Also, w.l.o.g. we assume that |5| = 2” for some n G N Then every state 
has a unique number between 0 and 2" — 1 and one can identify a state s with 
its binary representation si . . . s„, Si G B. 

^ Note that it is always possible to add further states to an LTS that are not reachable 
from any state in it already. 
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procedure MC((p : FLC, t : BDD) 
case tfi of 

q —+ return tq 

T return t 



X error “found free variable” 

(a) f' = applyBDD(and, fa, t[yi/a:i, . . . ,yn/xn]) 

return existsBDD(j/i . . . j/„, t') 

[a] ^ t' = applyBDD( implies, fo, t[y\/xi,. . . ,yn/xn\) 

return forallBDD(pi . . . y„, t') 

•00 V V’l return applyBDD( or, MC(i/)o, t), MC('!/)i, t)) 
'tpo A 'tpi return applyBDD(ond, MC('i/)o, t), MC('i/)i, t)) 
V’o;V’i ^ return MC(-i/io, MC(-i/)i, f)) 
aX.ip a a = fj, then = ff else = tt 

t' = makeBDD(X°) 

= ^[X°/X] 
t” = MC(Xi, t) 
i = 1 

while t' yf t" 
t' = t” 
i = i + 1 

X* = v>[x*"V^] 

t” = MC(X^ t) 
return t' 



Fig. 2. A BDD-based model checking algorithm for FLC. 

T can be represented by |Al| -I- \V\ many BDDs from BDD 2 n- We fix the 
variable names and the ordering as x\ < yi < . . . < Xn < y-n- 

— For every proposition q € V we have a BDD tq over the variables xi, . . . , a;„ 

S.t. tq{x\, . . . , Xn) = tt iff g G L{x\ . . . Xn)- 

— For every action a & A we have a BDD ta over the variables 

yi,...,yn s.t. ta{xi,yi , . . . , a;„, ?/„) = tt iff . . . y„. 

3 The Model Checking Algorithm 

Figure 2 contains the pseudocode of the symbolic model checking algorithm for 
FLC. The procedure MC((p, t) takes as arguments an FLC formula (p and a 
BDD t and returns a BDD t' representing all the states that satisfy (p relative 
to the states represented by t. 

Theorem 2. The algorithm Ts/LCfp, t) terminates for all p € FLC and all 
t G BDDn- 
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Proof. By induction on the structure of if. The claim is easily seen to be correct 
for ip = q, ip = T, ip = X for some X € V, p = (a) or p = [a] for some a € A. 
Using the induction hypothesis on fjo and i/'i it is also not hard to see that the 
claim holds for p = ipo\/ 'ipi, p = -tpo Aipi and p = 'if i - always assuming that 

the involved function calls terminate. 

Suppose now that p = aX.ip. By assumption, any call of MC(t/), t) for any 
t terminates. Thus, it suffices to show that that while-loops terminate. Note 
that its i-th iteration computes the t-th approximant X* to aX.ip. Thus, the 
function that maps t' to t" in each iteration is monotone. According to [Tar55] 
it possesses a fixpoint. This fixpoint is eventually found by the while-loop and 
causes it to terminate. Note that the underlying domain is finite. □ 

For a t G BDDn let |t] := {5i . . . 6„ | t(6i,...,6„) = tt} be the set of all 
variable assignments that evaluate to tt in t. Thus, if t encodes a set of states 
of a transition system then |t] is exactly this set. 

Theorem 3. For all closed p G FLC and all t G BDDn we have: |MC((/?, f)] = 

M(W)- 

Proof. By induction on the structure of p. Note that (p G V is impossible. 

Case p = q: |MC(g, t)] = |tg] = {s € S \ q G Ls} = |g](T) for any T C 5, in 
particular T = |t] . 

Case p = t: |MC(r,t)l = |t] = Ir](|tl). 

Case p = (a): Let t be given and t' = applyBDD(on(i, ta, t[yi/xi, . . . ,yn/xn])- 
Then |t'] = {(s,m) G S \ s-^u and u G |t]}. Therefore |MC((a), t)] = 
|existsBDD(yi, . . . ,y„,t')] = {s G 5 | 3m G 5 s.t. (s,m) G |t']} = {s G 5 | 3m G 5 
s.t. s-2Gm and M G It]} = [(a)l(|t]). 

Case p = [a] .• Similar to the previous case with V instead of 3 and implies instead 
of and. 

Case p = if qM ifi: Let t' = MC('i/;o, t) and t" = M.C{ifi,f). By assump- 
tion we have |F] = IV'oKW) and |t'] = IV'iKW)- Now, |MC('i/'o V f/'i, t)l 
= |applyBDD(or,t',t")l = {s G 5 | s G |t'l or s G |t"]} = |F] U |t"] = 
IV'oKW) U IV'iKW) = M(|t]). 

Case p = if 0 A if I : Similar to the previous case with n instead of U. 

Case p = ifo]if\: Let t' = M.C{ifi,t). By assumption we have |t'] = IV'iKW)- 
Furthermore, let t” = M.C{ifo,t')- Using the hypothesis again we have |t"] = 

IV'oKM) = W(IV’iKW)) = IV’o; V’lKW)- 

Case p = pX.if: Note that by assumption the call of MC(<p,t) computes a 
sequence of approximants A* to p. According to Theorem 2, only finitely many 
approximants are evaluated. For each i we have |MC(Abt)] = |A*](|t]). Ac- 
cording to [Tar55], the sequence of the latter converges to |/iA.i/'] (|t]) which is 
returned by the algorithm. 
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Case ip = vX-ip: The same as the previous case. □ 

The last two cases of this proof show why model checking for FLC still is feasible 
despite being PSPACE-hard. In order to computer the states in lpX.p]{T) it is 
not necessary to compute the least fixpoint of the functional F{f) = 
where f : 2^ ^ 2^ . Using fixpoint approximation this could take a number of 
steps double exponential in |5|. 

In fact, it suffices to compute the value of the least fixpoint of F on the 
argument T. If T = 0 or T = 5 then the number of steps needed in the worst 
case is bounded by |5|. In all other cases the number is bounded by the width 
of the subset-lattice of S which is exponential in |5|. However, in order to fully 
exploit an exponential number of approximations, p has to describe an exponen- 
tially long “walk” through this subset-lattice. This is not easily achieved since it 
necessarily means that |(p] maps exponentially many sets of states to something 
which is neither a subset nor a superset of it. 

An example of such a formula is given in [Lan02b]. One can argue that 
the transition system used there is rather unnatural and constructed since the 
described phenomenon only occurs because it is build of components whose sizes 
are prime numbers. 



4 A Simple Case Study 



In this section we describe a small case study which supports the claim that 
BDD-based FLC model checking is feasible. The implementation was carried 
out using Java"'"'^ and the JavaBDD package [Wha03] which is a front-end to 
the BDD library BuDDy [LN03] written in C. All the tests were run on a machine 
with 3 GB of main memory and two Intel® Xeon"'"'^ processors running at 2.40 
GHz^. The results presented in this section do not have a broad meaning. They 
simply show that non-regular properties can in principle be verified on finite 
models up to a decent size. 

Definition 2. Let V = {tt, ff } and A = {push, pop, reset, idle}. An N -hounded 
stack is aT = (5, {AG | a € A}, L) with S = (0, . . . , 2-^ — 1} and for all s,t G S: 



push , 

s > t 



sAltAUt 



reset , 
S !■ t 



idle , 
S ^ t 



^ff 


t = s -\-l 




t = s — 1 


^ff 


t = 0 


^ff 


s = t 



In other words, an A^-bounded stack models a counter that can be increased and 
decreased between the values 0 and 2-^ — 1, reset to 0 and do nothing. According 

® To the best of our knowledge, the fact that the computer has two processors instead 
is neither a prerequisite nor does it have any effect on the data presented here. 
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to the previous sections, an A^-bounded stack can be represented using 4 BDDs 
over 2 • N variables. We use the variable ordering suggested in Section 2.3. 

Figure 3 depicts the BDDs’ sizes as a function of N. Note that in explicit 
state representation, the size of a bounded stack is exponential in N. 



nodes 




push, pop 
idle 
reset 



Fig. 3. The BDDs’ sizes. 



Example 3. Let 

f/' := vX.[pop\]fl A [reset]-, X A [idle]-,X A 

[push]-, [v Z .[pop] A [reset]-, Z A [idle]-,Z A [pmsh]-, Z-, Z)-, X 

This is essentially the same formula as the one of Example 2 - tailored to N- 
bounded stacks. Thus, f/' says that on every path consisting of any of the actions 
push, pop, reset and idle, the number of pop's so far is always bounded by the 
number of push's so far. Note that this formula is not equivalent to any y^-calculus 
formula. 

It is only state 0 of any iV-bounded stack which has this property. The times 
needed to verify this can be found in Figure 4. 



5 Discussion 

We have shown that - using an appropriate extension of the modal /r-calculus 
- BDDs can be used to verify properties which are not definable in the modal 
/i-calculus. It is known that the model checking problem for FLC is PSPACE- 
hard, even for fixed formulas. But when using symbolic representations and 
calculations on those, it is not foreseeable whether a problem that is intractable 
from a complexity-theoretic point of view does not have instances which can 
be solved in reasonable time. Our case study shows that this is indeed true for 
FLC’s model checking problem - even for instances (see formula ip of Example 3) 
that describe interesting properties. 

Note that the aim of this case study was just to show that in principle BDD’s 
can be used to do model checking for FLC. The implementation consists of ap- 
proximately 230 lines of normal Java code. Another 870 lines are used to make 
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time [sec] 




Fig. 4. The model checking running times. 



the program understand full FLC which would not have been necessary to ac- 
quire the data of the previous section. No optimisation whatsoever has been 
carried out apart from trying two different variable orderings. The implementa- 
tion relies on a generic BDD library and uses the naive model checking algorithm 
of Figure 2. Nevertheless, it was possible to model check a universal formula in 
less than 7 seconds on a transition system which in explicit representation would 
require to visit far more states than there are atoms in the universe - keeping 
PSPACE-hardness in mind. 

Note that the algorithm we used is based on the principle of fixpoint elimi- 
nation: formulas are unfolded whenever necessary. In order to keep notation as 
simple as possible we have not used environments, i.e. an algorithm based on 
fixpoint calculation which can easily be implemented in a functional language. 
Maybe such an algorithm would constitute an improvisation over the one used 
here since fixpoint calculations would not require quadratic time anymore. 
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Abstract. Bounded Model Checking, although complete in theory, has been thus 
far limited in practice to falsification of properties that were not invariants. In 
this paper we propose a termination criterion for all of LTL, and we show its 
effectiveness through experiments. Our approach is based on converting the LTL 
formula to a Biichi automaton so as to reduce model checking to the verification 
of a fairness constraint. This reduction leads to one termination criterion that 
applies to all formulae. We also discuss cases for which a dedicated termination 
test improves bounded model checking efficiency. 



1 Introduction 

The standard approach to model checking an LTL property [17, 9] consists of checking 
language emptiness for the composition of the model at hand and a Biichi automaton 
that accepts all the counterexamples to the LTL property. A competing approach con- 
sists of encoding the problem as propositional satisfiability (SAT) [2]. In this approach, 
known as Bounded Model Checking (BMC), a propositional formula is constructed 
such that a counterexample of bounded length for the LTL formula exists if and only if 
the propositional formula is satisfiable. 

The basic BMC just described is not complete in practice: It often finds a coun- 
terexample if it exists, but it cannot prove that a property passes unless a tight bound 
on the completeness threshold [6] of the state graph is known. Such a bound is difficult 
to obtain. The issue of completeness is addressed by recourse to an induction proof in 
[14] and [7], or by the use of interpolants in [10] so that BMC can be used for both 
verification and falsification of invariants. 

The approach of [14] is based on the observation that if a counterexample to an 
invariant exists, then there is a simple path from an initial state to a failure state that 
goes through no other initial or failure state. Every infinite path that extends this simple 
path violates the invariant. Therefore, an invariant holds if all states of all paths of length 
k starting from the initial states satisfy the invariant, and moreover, there is no simple 
path of length k + 1 starting at an initial state or leading to a failure state, and not going 
through any other initial or failure states. 

This method can be easily extended to prove LTL safety properties. For full LTL, 
one can convert the check for a liveness property into the check of a safety property 

* This work was supported in part by SRC contract 2003-TJ-920. 

R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 96-108, 2004. 
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following [13]. However, the conversion doubles the number of state variables. An ap- 
proach that does not incur this penalty is the subject of this paper. We translate the 
given LTL formula into a Biichi automaton and compose the latter with the model as in 
[17, 6]. This step reduces the checking of any LTL property to the one of F G for a 
propositional formula p, on the composed model. 

A counterexample to F G ^p exists if there is a simple path from an initial state of 
the composed model followed by a transition to some state on the path^. If there is no 
simple path from an initial state of length k, then there cannot be a counterexample of 
length fc + 1. This condition is the counterpart of the one for invariants that checks for 
simple paths from initial states. However, there is no strict analog of the states failing 
an invariant in the general case. Hence, the check for no simple paths of length k into 
failure states must be replaced by a criterion that guarantees that no loops satisfying 
certain acceptance conditions may be closed by extending paths of length k. 

In this paper, we present such a criterion to prove LTL properties in general. We 
also discuss a more efficient criterion for a common special case. The translation of the 
LTL formula can be accomplished in several ways. In Sect. 4 we discuss the impact of 
various choices on the efficiency of the model checker. 

As in the case of invariants, the effectiveness of our termination criteria depends on 
the lengths of the simple paths in a state graph. However, our experiments, presented 
in Sect. 5, show that many properties that defy verihcation attempts by either standard 
BMC or BDD-based model checking can be proved by our approach. 

2 Preliminaries 

The goal of LTL model checking is to determine whether an LTL property is satisfied in 
a finite model of a sequential system. The behavior of this sequential system is described 
by a Kripke structure. A Kripke structure K. = {S, S, I, L) consists of a hnite set of 
states S whose connections are described by the transition relation 5 C 5 x S'. If 
(s, t) G 5, then there is a transition form state s to state t in JC. The transition relation 
5 is total: For every state s G S there is a state t G S such that (s, t) G S. I C S is the 
set of initial states of the system. The labeling function L : S ^ 2^^ indicates what 
atomic propositions hold at each state. We write S{s, t) for (s, t) G S; that is, we regard 
i5 as a predicate. Likewise, we write I(s) to indicate that s is an initial state, and, for 
p G AP, p{s) to indicate thatp G L(s). 

Definition 1. A sequence of states (sq, . . . , Sk) forms a path of length k of Kripke struc- 
ture K, if it satisfies 





0<i<k 

The path is initialized if I{sq) holds. A simple path of length k satisfies: 




simplePathi. = path^ A A (Si ^ Sj) . 



Precisely, this is the case when BMC starts from paths of length 0, and increases the length by 
1 every time. 
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The simple path condition can be easily expressed with a number of CNF clauses that 
is quadratic in the length k of the path. Recent work [8] reduces the number of required 
clauses to 0{k log^ k). 

Definition 2. A loop condition is true of a path of length k if and only if there is a 
transition from state Sk to some state of the path. 



Definition 3. The LTL formulae over atomic propositions AP are defined as follows 

— Atomic propositions, true, and false are LTL formulae. 

— if f and g are LTL formulae, then so are ^/, f A g, f \/ g, X f, and / Up. 

An LTL formula that does not contain the temporal operators (X and U ) is propositional. 
We write f R gfor U ~^g), F f for true U f, and G f for false Rp. 

LTL formulae are interpreted over infinite paths. An atomic proposition p holds 
along a path tt = (so,si,...) if p{sq) holds. Satisfaction for true, false, and the 
Boolean connectives is defined in the obvious way; tt \= X / iff |= /, where 
7T* = (sj, . . .); and tt |= /Up iff there exists i > 0 such that tt* |= p, and 
for j <i,TT^ \= f. 

A safety linear-time property is such that every counterexample to it has a finite 
prefix that, however extended to an infinite path, yields a counterexample. A liveness 
property, on the other hand, is such that every finite path can be extended to a model of 
the property. Every linear time property can be expressed as the intersection of a safety 
and a liveness property [1]. 

Though in principle a counterexample to a linear-time property is always an infinite 
sequence of states, for safety properties it is sufficient and customary to present an ini- 
tialized simple path that leads to a bad state — one from which all extensions to infinite 
paths result in counterexamples. For liveness properties, on the other hand, counterex- 
amples produced by model checkers are ultimately periodical sequences of states. Such 
sequences can be presented in the form of an initialized path followed by a transition 
to one of its states. As an example, in a counterexample to the liveness property F p all 
states of the path satisfy ^p. In a counterexample to F G ^p, the transition from the last 
state of the path reaches back far enough that a state satisfying p is included in the loop. 

Definition 4. A Biichi automaton over alphabet S is a quadruple 



where Q is the finite set of states, AfQxSxQ is the transition relation, qo € Q is 
the initial state, and F Q is a set of accepting states (or fair set). 

A run of A over an infinite sequence w = (wq ,Wi, . . .) € is an infinite sequence 
p = (po, pi, . . .) over Q, such that po = qo, and for all i > 0, (pi, Wi, Pi+i) & A. A 
run p is accepting if there exists qj G F that appears infinitely often in p. 

Boolean satisfiability (SAT) is a well-known NP-complete problem. It consists of 
computing a satisfying variable assignment for a propositional formula or determining 
that no such assignment exists. 




0<l<k 



= {Q,^,qo,F) , 
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3 Proving Properties with Bounded Model Checking 

Bounded Model Checking (BMC) [2] reduces the search for a counterexample to an 
LTL property to propositional satisfiahility. Given a Kripke structure 1C, an LTL formula 
/, and a bound k, BMC tries to refute /C |= / by proving the existence of a witness of 
length k to the negation of the LTL formula. 

BMC generates a propositional formula |/C, ^/]fc that is satisfiable if and only if a 
counterexample to / of length k exists; |/C, ~'/]fe is defined as follows; 

1^,^/lfc = -f(so) Apaf/ifc A |^/]fc , (1) 

where |^/]fe expresses the satisfaction of along that path. Of particular interest to 
us are three cases: 



|^Gp]= \f -^p(si) (2a) 

0<i<k 

|^FG^p]= \J {S{sk,si)/\ \J p{s^)) (2b) 

0<l<k l<i<k 

|^Fp]=LfcA f\ ^p{si) , (2c) 

0<z<fc 

where p is a propositional formula. The first of the three cases is encountered when 
checking invariants. The second occurs when checking fairness constraint [4]. It is im- 
portant because model checking any LTL formula / can be reduced to checking for the 
satisfaction of a fairness constraint by translating the LTL formula to a Biichi automa- 
ton. This translation allows us to deal in a uniform manner with all of LTL. However, 
common cases may beneht from special treatment. We illustrate these benefits for for- 
mulae of the form F p, which is our third interesting case. 

For an invariant G p, no counterexample of length greater than or equal to k exists if 
|/C, ^ Gpjfe is unsatishable, and either of the following predicates is unsatisfiable [14]: 

x(k) = I{sq) A simplePath). A (3a) 

0<i<k 

({k) = simplePath). A ^p(sfc) A p{si) ■ (3b) 

0<i<k 

For checking fairness constraints, an unsatishable x(fc) does not guarantee termina- 
tion because all counterexamples may have to go through more than one initial state. 
Therefore, a weakened form must be used: 

x'{k) = /(sq) a simplePath^. . (3a') 

If x'{f^) is unsatishable, then there can be no simple path of length k that can be ex- 
tended to a counterexample by a transition back to a state along the path. For (3b), 
dropping the requirement that all states except the last one satisfy p is not sufficient. 
In the next two sub-sections we develop termination criteria that replace (3b) when 
/ = F G ~^p, and when f = F p. 
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3.1 Proving F G 

Theorem 1. Let /C = (S', <5, 1, L) be a Kripke structure, let p G AP be an atomic 
proposition, and let the following predicates denote sets of paths in 1C: 



a{k) = I{sq) a simplePathf. /\p{sk) (4a) 

P{k) = simplePath^j^i A ~^p{sk) Ap(sfc+i) (4b) 

Pfk) = simplePathi._^_i A A Ap(sfc+i) (4b0 

0<i<k 

|/C,^FG^_p]fc =/(so) ApaiAfc A \J [S{sk,si)A \J p{si)] . (4c) 

0</<fc l<i<k 



Let m be the least value of k for which P'{k) is unsatisfiable, and n the least value ofk 
for which (a V P){k) is unsatisfiable. Then, |/C, ^ F G ~^p\k is unsatisfiable unless it is 
satisfiable for fc < n + m — 1. 

Proof. Since (}'{k + 1) is satisfiable only if fi'{k) is, and /3'(|S| + 1) is unsatisfiable, 
there is a minimum m > 0 such that j3'{m) is unsatisfiable, and for k > m, f3'{k) 
remains unsatisfiable. A similar argument applies to /3{k). 

If a{k) is unsatisfiable, every initialized simple path of length fc in /C ends with a 
state Sfc such that ~^p{sk). In addition, if /3(fc) is unsatisfiable, no simple path of length 
fc that ends in a state Sk such that ~^p{sk) can be extended to a simple path of length 
fc + 1 such thatp(sfe+i). Hence, every initialized simple path of length fc + 1 ends in a 
state Sk+i such that ^p(sfc_|_i). Therefore, (aV/3)(fc+ 1) is satisfiable only if {aV P){k) 
is. Since adS”! + 1) is unsatisfiable, there is a minimum n > m for which (a V P){n) 
is unsatisfiable. In addition, for k > n {a V (3){k) remains unsatisfiable. 

If |/C,^ F G -^pjk is satisfiable for k = n' < n, then the theorem holds for JC. 
Suppose it is satisfiable for k = n' > n, but not for any value of fc less than or equal to 
n. Then 



7 (fc) = I{sq) a simplePathf. A V [S{sk,si)L \f p{si)] (4c') 

0<l<k l<i<k 

is also satisfiable for some fc = n", n < n” < n' . Since every initialized simple path 
of length n" > n satisfies ^(p(s„) V • • • V p{sn")), if there is a path of length k > n 
satisfying 7 (fc), no state Si in (4c') such that p{si) holds can have i > n. Hence, the 
maximum length of such a path is m + n — 1; otherwise, there would be a simple 
path of length m' > m satisfying (4b') from s„ to a state that satisfies p. Therefore, if 
there is no path of length at most m + n — 1 that satisfies 7 (fc), then |/C, ^ F G -^pjk is 
unsatisfiable for any fc > 0. □ 

Theorem 2. There exists a family of structures {ICi}, i > 0, such that the minimum 
value of k for which y{k) is satisfiable is m + n — 1 = 2n — 1. 
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Proof. Structure K,i is defined as follows: 



Si — • ■ • : 



h = {sq} 



k = {(sj,Sj+i) I 0 < j < 2i} 

u {(S2z+1, Si)} 



L(sj) 



{p} if j = * 

0 otherwise . 



For this structure, m = n = i + 1; 7(fc) is satisfiahle for k = 2i + 1 and for no other 
value of k. (Regarding criterion (3a'), ~x!{k) is unsatisfiahle for fc > 2i + 1.) □ 



As shown in Sect. 5, for many models and properties, the termination criterion based 
on Theorem 1 is more effective than the one based on (3a'). 

The conditions of Theorem 1 can be checked efficiently by observing that (4b) is 
unsatisfiahle only if (4b') is, and that the satisfiability of (4a) is immaterial until (4b) 
becomes unsatisfiahle. Initially, it is therefore sufficient to check (4b'); when this be- 
comes unsatisfiahle, one records the value of m and switches to checking (4b). When 
the latter also becomes unsatisfiahle, then one starts monitoring (4a) until the value of n 
is found. Naturally, if (4c) becomes satisfiahle, the process terminates. It is not required 
to check one of (4a)-(4b') for all values of k, though, obviously, skipping some checks 
may lead to trying larger values of k than strictly necessary. 



3.2 Trap States 

Suppose that a predicate r is given such that from a state s that satisfies t(s), no state 
s' satisfying p{s') V ^t(s') can be reached. Then, when checking F G ~^p, (3a') can be 
strengthened as follows: 

x"(fc) = I{so) A simplePathf. A —•T{sk) ■ (3a") 

The model on which F G is checked is normally obtained by composition of the 
given Kripke structure with a Biichi automaton for the negation of an LTL property. 
The automaton may contain a trap state, that is, a non-accepting state with a self-loop 
as only outgoing transition. Such a state is often introduced when making the transition 
relation A of the automaton complete. In such cases, one can take r as the predicate that 
is true of all states of the composition that project on the trap state of the automaton. 



3.3 Proving F p 

Theorem 3. Let K, = (S', 5, 1, L) be a Kripke structure, let p € AP be an atomic 
proposition, and let the following predicates denote sets of paths in JC: 

9{k) = /(sq) a simplePathj^ A A 

0<i<k 

|/C,^Fp]fc = /(so) Apat/ifc ALfc A f\ ^p(si). 

0<i<k 

Let n be the least value of k such that 0(k) is unsatisfiahle. Then |/C, ^ Fpjfc is unsat- 
isfiable unless it is satisfiahle for k < n. 



(5a) 

(5b) 



102 



Mohammad Awedh and Fabio Somenzi 



Proof. Since ffdS'l + 1) is unsatisiiable, there exists a minimum k such that 0{k) is un- 
satisfiable. Let this minimum be n. Since 9{k + 1) implies 6{k), if 9{n) is unsatisfiable, 
for k > n, 9{k) remains unsatisfiable. 

If |/C, ^ f pjk is satisfiable for k = n' < n, then the theorem holds for JC. Suppose 
it is satisfiable for k = n' > n, but not for any value of k less than or equal to n. Then 

a{k) = I{sq) A simplePathj. A Lk A ~^p{si) (5b') 

0<i<k 

is also satisfiable for some k = n", n < n" < n' . Since a{k) implies 9{k), assuming 
that a{k) is satisfiable for k = n” > n leads to a contradiction. □ 

Note that the value of n in Theorem 3 corresponds to the (predicated) recurrence ~^p- 
radius of [13]. 

4 Minimum-Length Counterexamples 

One virtue of the standard BMC algorithm is that it can produce counterexamples of 
minimum length for all LTL properties if the lengths of the paths whose existence is 
checked by SAT starts at 0 and is increased by 1 every time. With BDD-based LTL 
model checking this is not the case for two reasons. The first is that the shortest fair cy- 
cle problem is solved only heuristically [5, 12]. The second reason is that in BDD-based 
LTL model checking, a counterexample is a path in the composition of Kripke struc- 
ture and property automaton. Such a counterexample may be longer than the shortest 
counterexamples found in the Kripke structure. 

Example 1. Figure 1 shows a Kripke structure 1C with S = {a, 6}, <5 = {(a, b), {b, a)}, 
/ = {a}, L{a) = {r}, and L{b) = 0. This structure is composed with a Biichi au- 
tomaton A for the negation of = G(r — > F q). The alphabet 27 of ^ is 2^^. In the 
figure, an arc of the automaton is annotated with the characteristic function of all the la- 
bels for which there is a transition between the two states connected by the arc. Hence, 
A can follow an arc into a state if it reads a letter of the input word that satisfies the 
arc’s formula. The doubly-circled states are accepting. The shortest counterexample to 
K. \= (fi found in /C II A includes three states, aO, 61, and al, even though there is a 
counterexample consisting of two states, a and 6, in /C. 

Even when it is possible to retrieve a shortest path in the Kripke structure from the 
shortest path in the composition — as in the case of Example 1 — the computation on 
K. II ^ is likely to be more expensive than the one on JC alone because the transition 
relation is unrolled more times. 

An LTL formula may be translated into many distinct Biichi automata. Though they 
all accept the same language, they may differ in “style” (labels on the states vs. labels 
on the transitions; one acceptance condition vs. several), or simply in the numbers of 
states and transitions. 

Automata with labels on the transitions react to the evolution of the Kripke struc- 
ture with which they are composed with a delay of one step. This is an obstacle in 
producing shortest counterexamples. Automata with labels on the states do not have 
this disadvantage, but do not guarantee shortest counterexamples either. 
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Fig. 1. The composition with the Biichi automaton affects the length of the counterexample 



{r} 


J 






V 


) 




Fig. 2. The position of the labels in the Biichi automaton and its transition relation affect the 
length of the counterexample 



Example 2. Figure 2 shows the Kripke structure of Example 1 composed with a Biichi 
automaton A' for the negation of p = G(r — > Fq) with labels on the states. The 
transition drawn with a dashed line from State 2 to State 1 can he added or removed 
without changing the language accepted hy A' [16]. However, whether it is present 
or not affects the counterexample found to JC \= p. With the optional transition, the 
language emptiness check applied to /C || A returns the cycle (al,62), which is of 
minimal length. By contrast, when the transition is omitted, the shortest counterexample 
toJC\=p has three states: al, 62, and a2. 

Example 2 shows that adding transitions to an automaton may lead to shorter counterex- 
amples. On the other hand, more transitions may result in longer simple paths which 
may delay the triggering of the termination conditions. To avoid these problems, the 
check for existence of counterexamples of length k is performed according to the orig- 
inal algorithm of [2], while the termination criteria are applied to the composition of 
Kripke structure and Biichi automaton. The overhead of building two distinct models 
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for each value of k is more than compensated by the ability to terminate sooner for 
failing properties. 

5 Experimental Results 

We have implemented the termination criteria of Theorems 1 and 3, (3a'), and (3a") 
in VIS [3,18]. VIS includes an implementation of the BMC algorithm of [2] that uses 
zChaff [11] to check for the satisfiability of propositional formulae. VIS detects LTL 
formulae of special types and treats them accordingly. 

- For invariants and formulae that are syntactically safe according to the definition 
of [15], VIS returns a simple path to a bad state in case of failure. The termination 
criteria (3a) and (3b) of [14] are applied. 

- LTL formulae that contain no temporal operators except X are called bounded. The 
depth of a bounded formula is the maximum nesting level of X operators in it. VIS 
concludes that a bounded formula holds unless it finds a counterexample whose 
length does not exceed the formula’s depth. Propositional formulae are bounded 
formulae of depth 0. 

For formulae that fall into one of the above categories, we use the termination checks 
already implemented. For the others, we have implemented two approaches. 

- The first approach applies the standard BMC algorithm augmented with the termi- 
nation check of (3a") to the composition of the original model and a Biichi automa- 
ton for the negation of the formula. 

- The second approach checks if the given formula is of the form F p, in which case, 
the standard BMC algorithm augmented with the termination check of Theorem 3 
is applied. Otherwise, the termination checks of (3a") and Theorem 1 are applied 
to the composition of the original model and a Biichi automaton for the negation 
of the formula, while standard BMC is applied to the original model to check for 
violations of the property as discussed in Sect. 4. 

The results presented in the following tables are for models that are either from 
industry or from the VIS Verification Benchmark set [18]. For each model, we count 
each LTL property as a separate experiment. We exclude experiments such that all the 
methods we compare finish in less than 1 s. Experiments that take more than 1800 s are 
considered timed out. For all experiments, we set the maximum value of k to 30 and we 
check for termination at each step. The experiments were run on an IBM IntelliStation 
with a 1 .7 GHz Pentium IV CPU and 2 GB of RAM running Linux. The datasize limit 
was set to 1.5 GB. 

Table 1 shows the results of applying four algorithms to LTL properties for which 
VIS did not already implement termination criteria. We compare the performance of our 
method based on Theorems 1 and 3, and the termination criterion (3a") {autjat) to the 
standard BMC algorithm (bmc), to the use of the termination criterion (3a") (bmc-et) 
only, and to the BDD-based LTL model checking algorithm in VIS {Id). 

The first column in Table 1 is the name of the model, the second is the number of 
state variables, and the third is the property number. The remaining columns are di- 
vided into four groups, one for each algorithm. The first column in each group indicates 
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Table 1 . Comparison of aut.sat, bmc, bmc.et, and Itl 



Model 


State 

vars 


# 


aut.sat 


bmc 


bmc.et 


Itl 


N 


k 


Timers) 


N 


k 


Time(s) 


N 


k 


Timers) 


N 


Timers) 


Am2910 


99 


1 


yes 


3 


3.03 


7 


30 


112.58 


7 


30 


557.48 


7 


Timeout 


Bakery 


16 


1 


no 


13 


96.86 


no 


13 


78.05 


no 


13 


91.3 


no 


0.35 


Blackjack 


102 


1 


yes 


1 


1.01 


7 


30 


148.76 


yes 


1 


1.49 


yes 


282.6 


Chameleon 


7 


1 


yes 


3 


0.92 


7 


30 


68.81 


7 


30 


639.57 


yes 


0.1 


Coherence 


1 


1 


yes 


3 


2.32 


7 


30 


40.31 


7 


30 


178.46 


yes 


0.7 


2 


no 


5 


4.32 


no 


5 


0.99 


no 


5 


2.19 


no 


0.9 


3 


7 


21 


Timeout 


7 


30 


1231.59 


7 


19 


Timeout 


yes 


1.0 


D18 


506 


1 


no 


23 


1378.21 


no 


23 


82.68 


no 


23 


342.52 


7 


Timeout 


2 


yes 


0 


0.4 


7 


30 


123.45 


yes 


1 


13.92 


7 


Timeout 


D24 


238 


1 


no 


9 


34.53 


no 


9 


15.12 


no 


9 


29.35 


7 


Timeout 


Dcnew 


10 


1 


no 


6 


3.03 


no 


6 


0.88 


no 


6 


1.94 


no 


0.26 


2 


no 


5 


1.62 


no 


5 


0.38 


no 


5 


1.31 


no 


0.3 


Dekker 


6 


1 


no 


5 


2.61 


no 


5 


0.86 


no 


5 


1.31 


no 


0.09 


Fabric 


85 


1 


yes 


17 


11.21 


7 


30 


21.57 


7 


30 


116.95 


yes 


20.9 


Feistel 


293 


1 


yes 


19 


44.43 


7 


30 


39.35 


yes 


19 


35.54 


yes 


0.6 


Lock 


9 


1 


yes 


7 


1.16 


7 


30 


24.4 


7 


30 


359.63 


yes 


0.13 


Microwave 


4 


1 


no 


2 


0.21 


no 


2 


0.05 


no 


2 


0.11 


no 


0.01 


2 


yes 


3 


0.34 


7 


30 


5.87 


yes 


3 


0.29 


yes 


0.02 


3 


yes 


7 


0.95 


7 


30 


16.72 


yes 


8 


1.3 


yes 


0.1 


MinMax 


27 


1 


yes 


5 


13.16 


7 


30 


183.28 


7 


24 


Timeout 


yes 


0.41 


Nim 


33 


1 


no 


6 


19.54 


no 


6 


4.67 


no 


6 


11.76 


no 


464.1 


Palu 


37 


1 


no 


0 


0.1 


no 


0 


0.05 


no 


0 


0.05 


7 


Timeout 


2 


no 


1 


0.41 


no 


1 


0.14 


no 


1 


0.14 


7 


Timeout 


PLBUS 


307 


1 


yes 


5 


5.53 


7 


30 


155.11 


yes 


6 


15.68 


yes 


1.76 


RetherRTF 


43 


1 


no 


2 


1.31 


no 


2 


0.56 


no 


2 


0.79 


no 


1.03 


2 


7 


20 


Timeout 


7 


30 


1242.82 


7 


25 


Timeout 


no 


1.84 


3 


no 


2 


1.0 


no 


2 


0.54 


no 


2 


0.94 


no 


0.91 


4 


7 


25 


Timeout 


7 


30 


1014.33 


7 


28 


Timeout 


yes 


0.99 


5 


7 


30 


823.12 


7 


30 


182.49 


7 


30 


332.99 


yes 


1.43 


si 269 


37 


1 


yes 


9 


0.95 


7 


30 


21.37 


yes 


9 


0.78 


7 


Timeout 


sl423 


74 


1 


no 


9 


6.61 


no 


9 


2.55 


no 


9 


3.38 


7 


Timeout 


2 


yes 


3 


0.57 


7 


30 


12.21 


yes 


3 


0.49 


7 


Timeout 


Silverman 


17 


1 


yes 


1 


0.14 


7 


30 


13.72 


yes 


1 


0.16 


yes 


0.12 


Smult 


95 


1 


no 


1 


0.0.27 


no 


1 


0.07 


no 


1 


0.09 


no 


37.45 


2 


7 


30 


446.67 


7 


30 


7.52 


7 


30 


382.23 


yes 


35.47 


three.processor 


48 


1 


yes 


3 


2.5 


7 


30 


56.44 


7 


30 


934.23 


yes 


184.6 


Timeout 


31 


1 


no 


0 


0.06 


no 


0 


0.06 


no 


0 


0.06 


no 


1.13 


2 


no 


2 


0.76 


no 


2 


0.35 


no 


2 


0.42 


no 


1.64 


UniDec 


18 


1 


yes 


3 


2.76 


7 


30 


143.84 


yes 


10 


28.81 


yes 


0.16 


2 


yes 


8 


12.6 


7 


30 


112.87 


yes 


9 


18.42 


yes 


0.18 


3 


no 


6 


5.32 


no 


6 


1.58 


no 


6 


2.33 


no 


0.36 


4 


no 


6 


6.93 


no 


6 


3.47 


no 


6 


3.7 


no 


0.15 


UsbPhy 


87 


1 


7 


30 


380.54 


7 


30 


34.59 


7 


30 


130.72 


yes 


192.1 
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Table 2. Comparison for special cases 



Model 


State 

vars 


# 


Theorem 1 and (3a") 


Special cases 


Property type 


N 


k 


Time(s) 


N 


k 


Time(s) 


Arbiter 


16 


1 


7 


30 


432.6 


7 


30 


391.95 


Invariant 


Blackjack 


102 


1 


yes 


1 


2.4 


yes 


1 


1.01 


Fp 


Bpb 


36 


1 


yes 


3 


4.72 


yes 


0 


0.35 


Safety 


D4 


230 


1 


7 


30 


356.7 


yes 


9 


6.87 


Fp 


D18 


506 


2 


yes 


1 


14.66 


yes 


0 


0.4 


Fp 


D21 


92 


1 


7 


24 


Timeout 


7 


24 


Timeout 


Invariant 


D24 


238 


2 


yes 


21 


532.24 


yes 


9 


45.25 


Invariant 


Dekker 


6 


2 


7 


27 


Timeout 


yes 


18 


266.96 


Invariant 


Fabric 


85 


2 


yes 


19 


15.32 


yes 


8 


3.76 


Invariant 


FPMult 


43 


1 


yes 


7 


4.36 


yes 


2 


0.35 


Safety 


PUUS 


307 


1 


yes 


6 


18.89 


yes 


5 


5.53 


Fp 


Rrobin 


5 


1 


yes 


3 


0.2 


yes 


0 


0.02 


Safety 


si 269 


37 


2 


yes 


5 


2.26 


yes 


1 


0.22 


Invariant 


Timeout 


31 


3 


7 


30 


1023.22 


yes 


0 


0.07 


Invariant 


4 


7 


30 


923.45 


yes 


16 


24.91 


Invariant 


UniDec 


18 


2 


yes 


9 


22.09 


yes 


8 


12.6 


Fp 


5 


yes 


8 


17.87 


yes 


8 


1.62 


Bounded LTL 



whether each property passes (yes), fails (no), or remains undecided (?); the column 
labeled k, when present, reports the length of the longest counterexamples that were 
considered. The columns labeled Time give the times in second for each run. Boldface 
is used to highlight best run times. 

As usual, SAT-based model checking does much better than BDD-based model 
checking on some examples, and much worse on others. Within the SAT-based ap- 
proaches, autjat is the only one to prove (as opposed to falsify) a significant number 
of properties. In fact, all passing properties in Table 1 are proved by either autjat or Itl. 

The termination criterion (3a") is not very effective by itself. It only proves 1 1 of 
the 23 passing properties; 5 of them are proved by Theorems 1 and 3 for smaller values 
of k. By contrast, Theorems 1 and 3 prove 18 of the 23 passing properties. 

Augmenting the standard BMC with the termination criteria of Theorems 1 and 3, 
and (3a") helps to prove properties that are hard for the BDD-based method. In Table 1, 
Itl times out before deciding 9 properties, whereas aut_sat times out before deciding 3 
properties. In addition, aut_sat proves some properties faster than Itl. For example, for 
model Am2910, aut^at proves the property true in 3.03 s, while Itl does not reach a 
decision in 1800 s. As another example, for model three^twcessor, aut^at proves the 
property true in 2.5 s, while Itl takes 184.6 s to prove it. 

Table 2 illustrates the importance of checking for special cases. These include in- 
variants, syntactically safe properties, bounded LTL properties, and liveness properties 
of the form F p, where p is a propositional formula. All properties in this table are pass- 
ing properties. The column labeled k has the same meaning as in Table 1. If the value 
of k is 0, the corresponding property is an inductive invariant. 

In Table 2, the general method is slower. There are two reasons for that: The first is 
that using the termination criteria of Theorem 1 and (3a") generate more clauses for a 
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given value of k. The second reason is that longer counterexamples are examined. For 
instance, for Fabric, the general method needs 19 steps to prove the property, while the 
special case takes only 8 steps. As another example, sl269 has a bounded depth of 1 ; 
however, the method based on Theorem 1 and (3a") needs 5 steps to prove the property. 
The termination check of Theorem 3 is better than the termination check of Theorem 1 
when checking properties of the form F p. For example, for model D4, Theorem 1 fails 
to prove the property for k up to 30, while Theorem 3 proves it for k equal to 9 in only 
6.87 s. 

6 Conclusions and Future Work 

We have presented an approach to proving general LTL properties with Bounded Model 
Checking even without prior knowledge of a tight bound on the completeness threshold 
of the graph [6]. The approach translates the LTL property to a Biichi automaton — as 
is customary in BDD-based LTL model checking — so as to apply a uniform termina- 
tion criterion. Experiments indicate that this criterion is significantly more effective 
than the straightforward generalization of the termination criteria for invariants of [14]. 
Compared to the completeness threshold of [6], our bound takes into account the posi- 
tion of the fair states in the graph; hence, it may lead to much earlier termination. The 
experiments also underline the importance of detecting those cases for which special 
termination criteria are known. Comparison with BDD-based model checking shows 
a good degree of complementarity. Neither method proved uniformly better than the 
other, and together, the two could prove all passing properties in our set of experiments. 

Our current implementation uses Biichi automata with labels on the transitions. As 
discussed in Sect. 4, we need to explore the alternative provided by automata with la- 
bels on the states as a way to cause earlier termination. Another aspect needing further 
attention is that our approach only considers Biichi automata with one fair set. Gener- 
alized Biichi automata can be converted to non-generalized, but it is not clear that this 
would be preferable to an extension of Theorem 1 to handle multiple fair sets. 
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Abstract. We present an unfolding-based approach to LTL-X model- 
checking of high-level Petri nets. It is based on the method proposed by 
Esparza and Heljanko for low-level nets [4, 5] and a state of the art paral- 
lel high-level net unfolder described in [15, 13]. We present experimental 
results comparing our approach to the one of [4, 5] and the model-checker 
Spin [12]. 



1 Introduction 

The main drawback of model-checking (see, e.g., [2]) is that it suffers from the 
state space explosion problem. That is, even a relatively small system specifica- 
tion can (and often does) yield a very large state space. To alleviate this problem, 
a number of techniques have been proposed. They can roughly be classified as 
aiming at an implicit compact representation of the full state space of a reactive 
concurrent system, or at an explicit generation of a reduced (though sufficient 
for a given verification task) representation (e.g., abstraction and partial order 
reduction techniques [2]). Among them, a prominent technique is McMillan’s 
(finite prefixes of) Petri net unfoldings (see, e.g., [6, 16, 13, 17]). It relies on the 
partial order view of concurrent computation, and represents system states im- 
plicitly, using an acyclic net. More precisely, given a Petri net A, the unfolding 
technique aims at building a labelled acyclic net Unf^^ (a prefix) satisfying two 
key properties [6, 16, 13]: 

— Completeness. Each reachable marking of A is represented by at least one 
‘witness’, i.e., a marking of Unf^ reachable from its initial marking. Similarly, 
for each possible firing of a transition at any reachable state of A there is a 
suitable ‘witness’ event in Unf^. 

— Finiteness. The prefix is finite and thus can be used as an input to model- 
checking algorithms, e.g., those searching for deadlocks. 

A finite complete prefix satisfying these two properties can be used for model- 
checking as a condensed (symbolic) representation of the state space of a system. 
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Indeed, it turns out that often such prefixes are exponentially smaller than the 
corresponding reachability graphs, especially if the system at hand exhibits a 
lot of concurrency. At least, they are never larger than the reachability graphs 
[6,16,13], 

The unfolding techniques and algorithms described in [6-8, 10, 13, 14, 16-18, 
20] help to alleviate the state space explosion problem when model-checking 
low-level Petri nets. Moreover, the construction of the prefix can be efficiently 
parallelised [11,13]. However, the applicability of these techniques is restricted 
by the fact that low-level Petri nets are a very low-level formalism, and thus in- 
convenient for practical modelling. Therefore, it is highly desirable to generalise 
this technique to more expressive formalisms, such as high-level (or ‘coloured’) 
Petri nets. This formalism allows one to model in quite a natural way many con- 
structs of high-level specification languages used to describe concurrent systems 
(see, e.g., [1]). Though it is possible to translate a high-level net into a low-level 
one and then unfold the latter, it is often the case that the intermediate low-level 
net is exponentially larger than the resulting prefix. Moreover, such a translation 
often completely destroys the structure present in the original model. In [15, 13], 
an approach allowing one to build a prefix directly from a high-level net, thus 
avoiding a potentially expensive translation into a low-level net, was described. 
Experiments demonstrated that this method is often superior to the traditional 
one, involving the explicit construction of an intermediate low-level net. 

Petri net unfolding prefixes have been used for verification of simple safety 
properties, such as deadlock freeness, mutual exclusion and various kinds of 
reachability analysis [10,13,17,18]. (The LTL model checking algorithm pro- 
posed in [22] is quite complicated and a corrected version of the approach re- 
quires memory exponential in the size of the prefix in the worst case [4,5].) 
Recent work [4, 5, 8] suggested a method for checking LTL-X properties of low- 
level Petri nets. It uses a particular non-standard way of synchronising a Petri 
net with a Biichi automaton, which preserves as much concurrency as possible, 
in order to avoid a blow up in the size of the resulting prefix. 

In this paper, we build on the ideas of [4,5, 11, 13, 15] and propose a paral- 
lel algorithm for verification of LTL-X properties based on high-level Petri net 
unfoldings. To our knowledge, no such an algorithm existed before. 

2 Basic Notions 

We use M-nets [1] as the main high-level Petri net model, as we believe that 
it is general enough to cover many other existing relevant formalisms. The full 
description of M-nets can be found in [1] ; here we give only an informal introduc- 
tion, omitting those details which are not directly needed for the purposes of this 
paper. In particular, [1] devotes a lot of attention to the composition rules for 
M-nets, which are relevant only at the construction stage of an M-net, but not 
for model-checking an already constructed one. We assume the reader is familiar 
with the standard notions of the Petri nets theory, such as places, transitions, 
arcs, presets and postsets of places and transitions, marking of a Petri net, the 
enabledness and firing of a transition and marking reachability (see, e.g., [6]). 
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Fig. 1. An M-net system (a) and its unfolding (b). 



2.1 High-Level Petri Nets 

Let Tok be a (finite or infinite) set of elements (or ‘colours’) and VAR be a set 
of variable names, such that Tok n VAR = 0. An M-net is a Petri net such that: 

— Each of its places has a type, which is a subset of Tok and indicates the 
colours this place can contain. We assume that the types of all places are 
finite^. Valid markings of an M-net can be obtained by placing in each of its 
places a (possibly empty) multiset of tokens consistent with the type of the 
place. 

— Each transition is labelled by a guard, which is a well-formed Boolean ex- 
pression over Tok U VAR. 

— Each arc is labelled with a multiset of variables from VAR. 

For a transition t, we will denote by VAR{t) the set of variables which appear 
in its guard and its incident arcs^. 

The enabling and firing rules of M-nets are as follows: when tokens flow along 
the incoming arcs of a transition t, they become bound to variables labelling 
those arcs, forming a (partial) mapping cr : VAR(t) —>■ Tok. If this mapping can 
be extended to a total mapping a' (such an extension can be non-unique) in such 
a way that the guard of t evaluates to true and the values of the variables on 
the outgoing arcs are consistent with the types of the places these arcs point to, 
then t is called enabled and a' is called a firing mode of t. An enabled transition 
can fire, consuming the tokens from its preset and producing tokens in places 
in its postset, in accordance with the values of the variables on the appropriate 
arcs given by a'. 

As an example, consider the M-net system shown in Figure 1(a). At the 
initial marking, ti can fire with the firing modes o' = {ui 1,V2 2, U3 1} 

or a" = {ui 1-^- 1,U2 2,U3 i— > 2}, consuming the tokens from pi and p2 and 

producing respectively the token 1 or 2 in P3. 

^ In general, allowing infinite types yields a Turing-powerful model. Nevertheless, this 
restriction can be omitted in certain important cases [13, 15]. 

If some variable appears in the guard of t but not on its incident arcs, it must be 
explicitly given a finite type. 
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2.2 Translation into Low-Level Nets 

For each M-net system it is possible to build an ‘equivalent’ low-level one (the 
construction is given in [1,15,13]). Such a transformation is called ‘unfolding’ 
in [1], but since we use this term in a different meaning, we adopt the term 
‘expansion’ instead. One can show that the reachability graphs generated by an 
M-net and its expansion are isomorphic, i.e., the expansion faithfully models 
the original M-net. This means that LTL properties of high-level nets can be 
verified by building its expansion. However, the disadvantage of this transfor- 
mation is that it typically yields a very large net. Moreover, the resulting Petri 
net is often unnecessarily large, in the sense that it contains many unreachable 
places and many dead transitions. This is so because the place types are usually 
overapproximations, and the transitions of the original M-net system may have 
many firing modes, only few of which are realised when executing the M-net from 
its initial marking. Therefore, though the M-net expansion is a neat theoretical 
construction, it is often impractical. 



2.3 Petri Net Unfoldings 

The finite complete prefix of a low-level Petri net 17 is a finite acyclic net which 
implicitly represents all the reachable states of S together with transitions en- 
abled at those states. Intuitively, it can be obtained through unfolding E, by 
successive firings of transition, under the following assumptions: (a) for each 
new firing a fresh transition (called an event) is generated; (b) for each newly 
produced token a fresh place (called a condition) is generated. The unfolding is 
infinite whenever S has at least one infinite run; however, if S has finitely many 
reachable states then the unfolding eventually starts to repeat itself and can be 
truncated without loss of information, yielding a finite and complete prefix. 

One can show that the number of events in the complete prefix can never 
exceed the number of reachable states of E, for a precise statement see [6, 13, 16]. 
However, complete prefixes are often exponentially smaller than the correspond- 
ing reachability graphs, especially for highly concurrent Petri nets, because they 
represent concurrency directly rather than by multidimensional ‘diamonds’ as 
it is done in reachability graphs. For example, if the original Petri net consists 
of 100 transitions which can fire once in parallel, the reachability graph will be 
a 100-dimensional hypercube with 2^^^ nodes, whereas the complete prefix will 
coincide with the net itself. 

In [13, 15] unfoldings of high-level Petri nets were defined, and the parallel 
unfolding algorithm proposed in [11,13] was generalised to high-level nets. It 
turns out that the unfolding of a high-level net is isomorphic to the unfolding of 
its low-level expansion. (However, it can be constructed directly from the high- 
level net, without building this expansion.) Thus this approach is conservative in 
the sense that all the verification tools using traditional unfoldings as input can 
be reused with high-level ones. Figure 1(b) shows a finite and complete prefix of 
the M-net in Figure 1(a), coinciding with the finite and complete prefix of the 
expansion. 
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3 An Unfolding Approach to LTL-X Model-Checking 

In [4,5], an approach for unfolding based LTL-X model-checking of safe low- 
level Petri nets was proposed. It makes use of the automata-theoretic approach 
to model-checking [21]. A synchronised net system is constructed as the product 
of the original net system and a Biichi automaton accepting the negation of 
the property to be checked. Then the model-checking problem is reduced to the 
problem of detecting illegal w-traces and illegal livelocks in the synchronised net 
system. Both problems are solved by exploiting finite complete prefixes of the 
unfolded synchronised net system. The main advantage of this approach over 
Wallner’s [22] is its efficiency. Wallner first calculates a complete finite prefix 
and then constructs a graph, but the definition of the graph is non-trivial, and 
the graph grows exponentially in the size of the prefix [4, 5] . The approach of 
Esparza and Heljanko [4,5] avoids the construction of the graph, but builds a 
larger prefix. 

In this paper we follow the approach of [4,5], but we are using strictly safe 
M-nets (i.e., ones which cannot put more than one token in a place) instead of 
safe low-level Petri nets. We will explain our approach by means of the example 
shown in Figure 2(a) (and in the following denoted by T). It is an M-net model of 
a buffer which can store at most 2 items (in places p 2 and P4). To distinguish these 
two items they are represented by coloured tokens 0 and 1 . The M-net contains 

4 places and 3 transitions whereas its expanded low-level net contains 8 places 
and 8 transitions. One can see that if this example is scaled then its expansion 
grows exponentially in the size of the buffer and the cardinalities of place types, 
e.g., if the places have types {0..150} then the expansion contains 604 places and 
23103 transitions, whereas the prefix of such an M-net has just 10 conditions 
and 7 events (in this particular example place types do not influence the size of 
the prefix). Thus the prefix is often much smaller than the low-level expansion 
because the latter can contain many dead transitions and unreachable places 
which disappear in the prefix. Furthermore, for larger systems the expansion 
cannot be constructed within a reasonable time. The main advantage of our 
approach over the one of [4, 5] is that we unfold the M-net directly and do not 
have to build its expansion. 

Let us assume that we want to check the property (p = On(p2 ^ 0), 
i.e., “eventually the item 0 is never stored in the buffer cell P 2 again.” First 
of all, we have to construct a Biichi automaton accepting This means that 
the property p is violated by a system run in which ^{p 2 ^ 0) is true over and 
over again. The corresponding Biichi automaton is shown in Figure 2(b). 
We identify it with an M-net system (also denoted by which has a place 
for each state q of the automaton, with only the initial state Qq having a token. 
(The type of all its places is {•}.) For each transition {q, x, q') of A^^, the M-net 
has a transition {q, x, q'), where q and q' are the input and output places of the 
transition, and x is its guard. This M-net is shown in Figure 2(c), and in the 
following its places and transitions are called Biichi places and Biichi transitions, 
respectively. 
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pi:{0,l} P3:{0,1} 
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P2:{0, 1} P4:{0,1} 





Fig. 2. M-net T: buffer of capacity 2 (a), Biichi automaton constructed for the 
property (p = OD(p 2 ^ 0) (b), the M-net corresponding to A^ip (c), and the product 
net (d). 



In the next step a synchronised M-net system is constructed as the prod- 
uct of T and A^^p. A standard approach would synchronise the system and the 
Biichi automaton on all transitions. A drawback of such a synchronisation is that 
it completely sequentialises the system. Since a strength of the unfolding tech- 
nique is to exploit concurrency, such a product net would not be very suitable for 
unfolding based verification techniques. Therefore, we use the synchronisation 
construction of [4,5] generalised to high-level nets. In order to exploit concur- 
rency, T and are only synchronised on those transitions of T which ‘touch’ 
the places that appear in the atomic propositions of (p. In the following such 
transitions are called visible. For our example this means that T and A^^p only 
synchronise on the visible transitions t\ and ^ 2 , because they touch the place p 2 
which is the only place referred to in 

The resulting product net is shown in Figure 2(d). Unlabelled arcs denote 
that only • can flow along them. 

Construction ofT^cp-' 

1. Put T and A^^, side by side. 

2. Connect each Biichi transition with T in such a way that it ‘observes’ the 
places whose names appear in the guard of the transition. Let {q, cc, q') be 
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a Biichi transition. For each place p which appears in the guard x add arcs 
from p to (g, x, g') and back; both arcs are labelled with “p”. In our example 
this rule only applies to Iq because it is the only Biichi transition whose 
guard refers to p2- Therefore /q and p2 are connected by arcs which are 
labelled with p2- 

3 . Add scheduler places B and S (forming a flip-flop) in such a way that: 

— initially can make a step, and all the visible transitions of T are 
disabled. 

— after a step of only T can make a step. 

— after T has made a visible step, A^^ can make a step, and until that all 
the visible transitions of T are disabled. 

The intuition behind B and S is that A^^p (respectively, T) can make a step 
if there is a token on B (respectively, S). The types of these places are {•}. 
B is connected by arcs going from B to every Biichi transition (ug, ui and 
Iq), and going from every visible transition (ti and ^2) to B. S' is connected 
by arcs going from S to every visible transition (ti and 12), and going from 
every Biichi transition (uq, ui and Iq) to S. Only • can flow along the new 
arcs. Initially, B contains • and S is empty, which allows A^^p to observe the 
initial marking of T. 

Taking a look at the construction of T-,^ so far (see Figure 2 (d) without 
transitions Lq, Li, and L2) it is clear that an infinite transition sequence that 
touches the accepting Biichi place gi infinitely often violates the property ip 
because in this case p2 stores 0 over and over again. To detect such system runs 
we introduce the set I = {t \ t* contains an accepting place of A^^} of all the 
transitions putting a token into an accepting Biichi place. 

Now we claim that an infinite transition sequence of which is flreable 
from the initial marking and contains infinitely many occurrences of /-transitions 
violates p. In the following such sequences are called illegal u-traces. 

Unfortunately, not every sequence that violates the property p is detected by 
illegal w-traces because we synchronise just on the visible transitions. Consider 
a system that contains amongst others a “loop” of invisible transitions. Since 
we just synchronise on the visible transitions this “loop” would not take part 
in the synchronisation at all. Suppose the system reaches a marking M from 
which it is possible to Are a loop of invisible transitions. That is, the system 
would Are an infinite sequence of transitions without A^^p being able to observe 
it. Thus one can imagine a scenario where p is violated but A^^p cannot detect 
this. Therefore, such a situation must be dealt with separately. 

Since the firing of invisible transitions cannot change the marking of the 
visible places (i.e., those appearing in the atomic propositions of p), as well as 
the places of the scheduler and A^p, we simply have to check for each marking 
enabling an infinite sequence of invisible transitions the existence of an accepting 
run of A^p restricted only to those Biichi transitions which are enabled by this 
marking. (In such a case p is violated.) 

Formally, we augment the product net T^p with a set L of livelock monitors 
in such way that a new transition t G L iff there exists a place g in A^p such that 
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q € t* and, starting from q, the accepts an infinite sequence of transitions. 
Then the situation described above can be formalised by an illegal livelock, which 

is an infinite firing sequence of the form rt'^r such that Mg — > M — t G L 
and t' contains only invisible transitions. A formal description for adding L- 
transitions to is given in [5]. 

Theorem 1 (LTL-X Mo del- Checking of M-Nets). Let T be a strictly safe 
M-net system whose reachable markings are pairwise incomparable with respect to 
set inclusion^, and (p be an LTL-X formula. Lt is possible to construct a product 
net system satisfying the following properties: 

— T satisfies p iffT^^ has neither illegal u-traces nor illegal livelocks. 

— The input and output places of the invisible transitions are the same in T 
and T^tp 

From [4, 5] we know that the number of L-transitions can grow exponentially 
in the size of the M-net system T, and so inserting them explicitly into the 
product net would seriously hamper this approach. Therefore we follow 
the trick of [4, 5] and generate such transitions on-the-fiy during the unfolding 
procedure. In order to do so we extend the construction of the product net T^p 
by the following step: 

4. For each Biichi transition, add a copy of it with exactly the same guard and 
preset, but empty postset. (In our example, these new copies of uq, ui and 
Lq are Lq, Li and L 2 , respectively.) All these new copies form a subset of 
transitions which are candidates to become livelock monitors (L-transitions) 
during the unfolding procedure. 

The unfolding procedure makes use of these candidates in such a way that 
every time when an L-labelled event e could be inserted into the prefix it is 
checked whether starting from the marking M which is reached by firing e A^p 
accepts an infinite run. If this is the case then the L-event is inserted into the 
prefix (otherwise it is not), and its preset is expanded to the whole marking M, 
and its postset is a copy of the preset with instances of the visible, scheduler 
and Biichi places removed. This guarantees that from this marking only invisible 
transitions can be fired in the future. 

The check whether A^p can start an accepting run from M restricted to the 
Biichi places can be done as follows. Let q be the Biichi place marked under M. 
We remove from A^p all the transitions whose guards are evaluated to false 
under M (and thus cannot fire). Now, in order to check whether there exists an 
accepting run of A^p it is enough to search for a strongly connected component 
in the obtained graph which is reachable from q and contains at least one /- 
transition. 

® The latter condition is technical and can be removed [5]. 

^ And thus much concurrency is preserved. 
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4 Tableaux System 

We showed in Section 3 that the model-checking problem for LTL-X can be 
reduced to the problem of finding illegal w-traces and illegal livelocks. In [4, 5] 
these problems are solved by considering the prefix as a “distributed” tableaux, 
in which the conditions are “facts” and the events represent “inferences”, with 
the cut-off events being the terminals of the tableaux. We follow this approach, 
and informally describe how illegal w-traces and illegal livelocks are detected. 

The tableaux T for is shown in Figure 3. It contains three terminals, 
ei 5 , ei 6 , and ei 7 , but for the moment let us concentrate on en. Since it is a cut- 
off event of the prefix, there exists a partially ordered execution (po-execution) 
C {C = { 62 } in our example) with the same marking {qo, of as 

the po-execution [ciy] = {ci, . . . , eg, 612 , 614 , 617 }, where [e] is defined as the 
minimal (w.r.t. c) causally closed set of events containing e. Since C C [en], 
the execution [ 617 ] \ C = {ci, 63 , . . . , eg, ei 2 , 614 , 617 } starts and ends at the same 
marking {qo, S,Pi,p\} of T^,p, i.e., it can be repeated arbitrarily many times. 
Moreover, it contains an /-event ei 2 , and thus an illegal w-trace rr'“ is detected, 
where t = uq corresponds to C and r' = t 3 t 2 t 3 UotiUot 2 lotiUi corresponds to 
[ei7] \ C. 

The way of detecting illegal livelocks is quite similar, but in this case a 
terminal e occurring after an L-event cl is considered. This means that there 
exists a po-execution C C [e] (which also contains cl) such that the execution 
[e] \ C starts and ends at the same marking. Now we can use the same argument 
as above to find an infinite trace rr'“, but t' will contain only transitions of 
corresponding to the events occurring after e^. Since an L-transition empties 
the scheduler places of tokens, and they are never put back, no visible or Biichi 
transition of can be enabled after firing an L-transition. Thus r' consists 
only of invisible transitions, i.e., an illegal livelock is detected. (See [4,5, 10] for 
more details.) 

5 Experimental Results 

In this section we present experimental results for the verification of LTL-X prop- 
erties. We compare our implementation (in the following denoted by Punf) with 
the unfolding based LTL-X model-checker for low-level nets UnfSmodels [5], 
and with the model-checker Spin [12]. All experiments are performed on a Linux 
PC with a 2.4 GHz Intel(R) Xeon(TM) processor and 4 GB of RAM. All times 
are measured in seconds. 

We used UnfSmodels version 0.9 of 22nd of October 2003, and invoked it 
with the option -I (for LTL-X model-checking). 

We used Spin version 4.0.7 of 1st of August 2003, and invoked it with the op- 
tions -DMEMCNT=32 (to allow the use of all 4 GB of memory) and -DNOFAIR 
(as no fairness assumptions were needed). Spin’s partial order reductions were 
used in all cases (they are enabled by default). 

We used Punf version 7.03 (parallel). In order to have a fair competition 
against the other tools we invoked Punf in all examples with the option -N=l 
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Fig. 3. The tableaux T for the product net T^^p. The colours of conditions are not 
shown when they are •. The firing modes of events are not shown as they can be 
derived from their pre- and postsets. 
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Table 1. Experimental results for LTL-X model-checking. 
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Spin 


Punf 


Cyclic(15) 


168 


0.08 


Cyclic(16) 


478 


0.07 


Cyclic(17) 


1601 


0.10 


Rino(12) 


75.38 


0.30 


Rino(13) 


274 


0.50 


Ring(14) 


1267 


0.85 


Dp(13) 


559 


0.53 


Dp(14) 


2123 


0.75 


Ph(15) 


16.69 


0.01 


Ph(18) 


1570 


0.01 


Com(20,0) 


232 


0.02 


Com(21,0) 


686 


0.03 


Com(22,0) 


2279 


0.02 


Par(6,10) 


161 


0.02 


Par(7,10) 


mem 


0.04 



Net 


Formula 


Result 


UnfSmdls 


Spin 


Punf 
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□ {p ^ Oq) 


True 


0.19 


0.01 


0.08 


Bds 


□ (p ^ Oq) 


True 


199 


0.71 


8.47 


Dpd(7) 


OD-i(p A q A r) 


True 


507 


2.14 


7.25 


Furnaoe(3) 


OUp 


True 


1057 


1.00 


26.90 


Gasnq(4) 


OOp 


True 


240 


0.14 


8.46 


Rw(12) 


0 

1 

□ 


True 


2770 


0.44 


47.67 


Ftp 


ODp 


True 


>12000 


3.99 


836 


Over{5) 


ODp 


True 


66.01 


0.44 


0.12 


Cyolic(12) 


□ (p ^ Oq) 


True 


0.38 


11.25 


0.08 


Ring(9) 


ODp 


True 


2.13 


1.64 


0.13 


Dp(12) 


Oa-i{p A q A r) 


True 


13.05 


117 


0.36 


Ph(12) 


OD-!(p a q a r) 


True 


0.04 


0.61 


0.02 


Com(15,0) 


□ (p ^ -^ Oq ) 


True 


— 


3.11 


0.02 


Par(5,10) 


□ (p ^ -^ Oq ) 


True 


— 


3.60 


0.02 



Table 2. Experimental resnlts for the parallel mode. 



Net 


Spin 


Punf(1) 


Punf(2) 


C^Om(20,3) 


mem 


8.58 


6.01 


Com(22,3) 


mem 


11.51 


8.51 


Com(25,3) 


mem 


17.29 


12.84 


Par(20,100) 


mem 


8.60 


4.84 


Par(20,150) 


mem 


31.98 


18.28 


buf(20) 


— 


22.70 


16.95 


buf(25) 


— 


142.72 


89.40 



which means that the unfolding procedure is not parallelised, and Punf uses 
only one CPU. 

The benchmarks (except PH(n), COM(n,m) and PAR(n,m)) are taken from 
J.C. Corbett [3]. We used only deadlock-free examples because our semantics for 
LTL-X for deadlocking systems is different from Spin’s semantics and thus the 
results would not be comparable. These benchmarks together with the corre- 
sponding result files are available at 

http : //www. fmi .uni-stuttgart .de/ szs/people/ schroeter/CAV/ cav. tgz. 

In order to have a fair contest against Spin all systems were modelled in 
Promela, all of them (except PH(n), COM(n,m) and PAR(n,m)) by J.C. Cor- 
bett [3], but we scaled up some of the benchmarks. The M-nets used as the 
input for PuNF were automatically derived (using a routine implemented by one 
of the authors) from Spin’s automata representations of the Promela models. 
The low-level nets used as the input for UnfSmodels were obtained from the 
M-nets with the help of the hl2ll utility of the Pep tool [9]. 

Table 1(a) confirms that our approach outperforms UnfSmodels on all ex- 
amples, with an increase of speed up to 550 times (Over(5) example). But we 
should mention that as noted in [5] UnfSmodels is more or less an academic 
prototype, whereas Punf is a high-performance unfolding engine. Com(15,0) 
and Par(5,10) could not been verified with UnfSmodels because either the 
Biichi automaton corresponding to the LTL-X formula for the low-level net (in 
the former case) or the low level net itself (in the latter case) could not be gener- 
ated within reasonable time. Comparing Punf and Spin one can see that Spin 
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performs better on the examples over the line. In contrast, Punf outperforms 
Spin on the examples under the line. We wanted to investigate this further and 
therefore we scaled up these systems and checked them again. 

The results are shown in Table 1(b). They seem to confirm that Spin’s ver- 
ification time grows exponentially with the size of the benchmark, whereas the 
verification time of PuNF remains less than a second in all cases. All these sys- 
tems have a high degree of concurrency, and the results show that our partial 
order technique handles these systems quite well. In contrast. Spin’s partial or- 
der reductions are not very efficient on these examples. In our current theory this 
seems to be related to a reduction condition for LTL-X model-checking which is 
known as the reduction proviso (see, e.g., [2, 19] for further details). 

As it was already mentioned, our unfolding routine supports multiple threads 
running in parallel. To demonstrate this we performed some experiments on a 
two processor machine. 

The results are shown in Table 2. PuNF(n) means that Punf makes use of n 
processors. The results confirm that the process of LTL-X model checking can 
be parallelised quite efficiently. This speeds up the verification, in the best case 
up to n times. Also the results show that these examples are not verifiable with 
Spin because it runs out of memory. (We did not verify the Bup(n) system with 
Spin because it was modelled directly as an M-net). 

6 Conclusions 

We have presented an efficient algorithm for verification of LTL-X properties of 
high-level Petri nets. We followed an approach proposed by Esparza and Heljanko 
in [4, 5] for low-level nets, generalised it to M-nets, and implemented it using a 
state of the art parallel unfolding engine for high-level nets described in [15, 
13]. Finally, we presented results showing that our implementation outperforms 
UnfSmodels on all examples, and beats Spin on some examples, mainly those 
having a high degree of concurrency. To the best of our knowledge, this is the 
first parallel algorithm for verification of LTL-X properties of high-level Petri 
nets based on unfoldings. 
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Abstract. We present a practical compositional interface refinement 
methodology which helps to integrate formal verification with the design 
process. One of the main verification challenges is to keep up with the 
changes to the specifications as the design evolves, and in particular, 
the transformations to the interfaces between the components. Interface 
changes are usually incremental, and therefore, the verification efforts 
after each change should also be incremental in order to be scalable. 
This paper presents a compositional interface refinement methodology 
which addresses all of these issues. 

We have applied this methodology in the design process of a network 
controller that will be used in the Smart Memory project at Stanford. 
The design is split into modules which are modeled as transition systems, 
and the properties of their interfaces are described in the First-Order 
LTL. We perform interface refinement by compositional reasoning with 
the help of an additional interface converter module. Unlike in previous 
approaches, the inverse converter does not need to be constructed, thus, 
making the verification of the refinement step simpler. 



1 Introduction 

The advantage of formal verification over simulation is that one can verify all pos- 
sible behaviors of a design. This is especially important if the design is complex 
and has many corner cases that might not be covered by simulation. However, 
automated methods of formal verification have been limited to relatively small 
designs [15,5,8,1]. In order to verify a large design formally, one can apply com- 
positional reasoning and refinement techniques [6,9] to split the verification of 
the low-level implementation into manageable parts. Compositional refinement 
techniques guarantee that such decomposition preserves the correctness of the 
entire design. 

In this paper, we describe a new compositional interface refinement technique, 
which is similar to the traditional compositional refinement techniques, except 
that when an abstract component is refined to its concrete version, its interface 
is also allowed to change. 
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1.1 Interface Refinement 

It is a long held belief that if one could integrate formal verification into the 
design process, then it would make formal verification easier to perform, and 
improve the design process. Such tight integration was the key to successful 
microprocessor verification by Kroening in [11]. 

Integration of formal verification into a top-down design methodology means 
that a design starts at an abstract level for which exhaustive verification is done. 
Then, as the design evolves to a more detailed implementation, compositional 
techniques can be used in order to verify the more detailed design. 

However, most of the existing compositional refinement approaches require 
that the interfaces among the components do not change in the refinement pro- 
cess. This is not an easy requirement to meet, because engineering specifications 
become more detailed as the design evolves, and this necessitates the refine- 
ment of the interfaces. In order to integrate formal verification into the design 
process, the verification methodology must allow for interface changes that will 
occur during the course of the design. The problem of verifying that the changes 
to the interfaces preserve the correctness of the design is called the interface 
refinement problem. 

In this paper, we present a methodology that addresses the interface re- 
finement problem in the context of compositional verification. This allows the 
verification effort to follow the interface changes during the design and avoids the 
unnecessary restriction on the interfaces imposed by the traditional refinement 
techniques. 

1.2 Our Approach 

The problem of interface refinement has been recognized in the past, and a few 
approaches provide formalization of this problem [3,18,4,7,12]. These results have 
built several theoretical foundations for interface refinement, but their practical 
application in real engineering projects so far have been limited. 

The most practical approach to interface refinement, to the best of our knowl- 
edge, is the method presented by Brinksma et al [3]. Their approach is based on 
a process algebra framework, and the interfaces are refined through the use of a 
converter module that converts between the abstract and the concrete interfaces. 
One of the requirements of this methodology is that the converter must have an 
inverse. 

From our experience with the verification of the Smart Memory network 
controller, we found that the specifications are more naturally expressed in LTL 
rather than in process algebra. It is also natural to define the converter as a 
circuit which translates from the abstract interface to the concrete interface. 
Furthermore, the inverse of the converter may not exist. 

In section 2 we present a generic interface refinement methodology. In par- 
ticular, it works with the LTL specifications and does not require the existence 
of the inverse converter. We illustrate our interface refinement methodology by 
a simple example, then discuss its practical application to the Smart Memory 
network controller in section 3, and section 4 concludes the paper. 




124 



Jacob Chang, Sergey Berezin, and David L. Dill 



2 Compositional Interface Refinement 

Compositional refinement has been extensively studied in the past [6,14], and is 
now a standard approach to verification of complex systems. We briefly remind 
the basics of this approach and introduce the formal notation. 

For simplicity, let us consider a design consisting of just two modules, X and 
Y, connected together in parallel by the parallel composition operator: 
which we deliberately leave undefined. This operator is assumed to be associative: 
(Jf||F)||Z = Jf||(F||Z), so a composition of more than two modules can be 
written simply as Jf||F|jZ. 

Provided that our design X\\Y satisfies the property (j) (denoted X\\Y ^ 
(/)), our goal is to replace each component with a more detailed (or refined) 
implementations X' and Y', respectively, and show that the resulting system 
also satisfies (j). 

More formally, we introduce a refinement relation ^ over modules which pre- 
serves the correctness properties from a certain class (e.g. properties expressible 
in LTL). That is, if |= and X' ^ X, then X' \= ip for any property ip 
from the chosen class. The refinement relation must be transitive: X” ^ X' and 
X' ^ X imply X” < X.\n addition, we require that the parallel composition 
operator is associative and monotonic w.r.t. ^ for both arguments. That is, if 
W ^ X, then X'\\Y ^ X\\Y and Y\\X' ^ Y\\X. 

For this section, we leave the operators |j and ^ undefined, and only assume 
the following algebraic properties: associativity of ||, transitivity of and mono- 
tonicity of |j w.r.t. This allows the results to be used in various contexts. In 
particular, we define these operators in a somewhat unusual fashion, over LTL 
formulas, in section 2.2. 

Since the refined components X' and Y' may be rather complex, in order to 
simplify the verification problem, we would like to reason about each component 
individually, and avoid verifying X'||T' directly. A well-known approach to this 
problem is know as compositional refinement [9], which essentially boils down to 
showing that 

X' YX and Y' Y Y. (1) 

By the monotonicity of || and transitivity of ^ the properties (1) imply A'ljF' ^ 
X\\Y, and hence, A'||F' |= <p, provided that X\\Y \= (p. 

Notice that we can show (1) only if modules X' and Y' have the same in- 
terfaces as X and Y, respectively. However, in real designs, the interfaces often 
change as the specifications get refined, as described in section 1, and it is often 
desirable to refine the interfaces together with the modules. 

The problem of interface refinement consists in verifying the refinement of a 
composite module X\\Y d: X'\\Y' when the communication interface between the 
two components changes. For example, we have module X and Y communicating 
through an abstract interface c as shown in the top portion of figure 1. We want 
to refine X and Y to X' and Y' such that they communicate over a different 
protocol d, but their composition A'||F' should still have a combined property 
(p, given that the abstract system X\\Y satisfies the same property (p. 
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Notice that if we do indeed change the protocol, the direct compositional re- 
finement does not go through, since the refinement of the individual modules (1) 

fails. 




“V* '' "V" 

Y Y 



JY X' 




Fig. 1. Interface refinement by Brinksma et al 



The interface refinement technique suggested by Brinksma et al [3] (trans- 
lated to our notation) is to show the following refinements: 

X'diXWC and Y'<C-^\\Y, (2) 

where C and C~^ are protocol converters (figure 1). From the monotonicity of 
II, it immediately follows that X'||y' ^ X||C'||C'“^||y. Since C~^ is the inverse 
of C, the composition of the two converters C~^\\C is exactly equivalent to the 
identity function (that is, any input to C appears as the output of C~^). This 
implies that X||C'||C'“^||T = X\\Y, and hence, X'||y' ^ ^||T, which is the 
desired result of the refinement verification. 

The problem with this approach is that it is often difficult to find the inverse 
C~^ of the converter C. For instance, if the interface changes from a wide parallel 
bus (abstract interface) that sends data in a single clock cycle to a serial bus 
(concrete interface) sending the same data over several clock cycles, the serial- 
to-parallel converter C would have to “predict the future” in order to output the 
data the moment it enters the inverse converter C~^. One way to get around 
this problem is to use the concept of stuttering equivalence [13]. However, we 
have found that showing that the composition of the converters is the identity 
model is also difficult under this model. 



2.1 Compositional Interface Refinement: Our Approach 

Since the abstract interface and the converters will not appear in the concrete 
implementation, we would like to take advantage of this flexibility in our refine- 
ment proof. Thus, we propose the following method for interface refinement. 
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First, we only perform the refinement on one of the modules, say module 
Y, by showing that C\\Y' ^ Y, where C is a converter module translating 
the abstract protocol c to the concrete protocol d. By compositional reasoning, 
JfllCIIF' ^ (figure 2), and thus, JfllCIIF' also satisfies property ip. Next, 
instead of performing the same refinement step on module X, we show that the 
concrete module X' is a refinement of the composition X\\C. Putting the two 
refinements together completes the refinement chain: X'||F' ^ XljCljF' ^ X|jy. 




Y 




Y 




Fig. 2. Interface refinement: our method 



Note, that our approach requires only one interface converter, in the direction 
of our choice; the inverse converter is not needed. This method is also quite 
general, since it relies on the same algebraic properties of the composition || and 
refinement Y as the traditional compositional reasoning, and therefore, can be 
applied in any framework where such properties are satisfied. In pai’ticular, the 
method is applicable even when the composition and refinement operators are 
defined in non-standard ways (one of which is described in the next section), as 
long as the required algebraic properties are satisfied. 



2.2 First-Order LTL 

In the practical application of our compositional interface refinement method, 
we use a version of First-Order Linear-time Temporal Logic (LTL) similar to 
the one used in the STeP tool [2] at Stanford. LTL properties are preserved by 
trace inclusion, the type of refinement relation where the refined component has 
fewer behaviors, or traces than the abstract component. This logic is expressive 
enough to permit compositional reasoning with interface refinement directly in 
the logic. Additionally, LTL is the specification language of Cadence SMV[16], 
the primary tool we have used to perform the actual verification. 
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Syntax. We assume two disjoint sets of logical and state variables, Vl and Vs. 
The syntax of an LTL formula (j) is given by the following BNF form: 

<^::= p I </>! A02 I I G(/.i I F01 | U02| | </>i 

where p is an atomic formula, (j)i and 02 are formulas and u G Us is a state 
variable. Atomic formulas are predicates over state variables. 



Semantics. The semantics of an LTL formula <j> is defined over all paths in 
a model (Kripke structure) M = {S, So, D), where S is the set of states. 
So V S is the set of initial states, S' x S' is the transition relation, and D 
is the domain of state and logical variables. A state s G S is an assignment of 
values from D to state variables; that is s : Ug — > D. We also introduce a logical 
variable assignment e :Vl ^ D. 

The semantics of a LTL formula 0 is defined over a sequences of states called 
paths 7T that represent execution traces of the model. A state is an assignment 
to variables that occur in 0 and, possibly, to other variables. The temporal 
operators are the same as in propositional LTL. For example, tt |= G 0 means 
that 0 is true everywhere along the path. A quantified formula tt |= V^m. 0 holds 
if 0 holds for all paths tt' such that tt' is the same as tt except for assignment 
to the state variable u. An existential formula tt |= 0 is defined similarly 

except that we only need to find one such tt' which satisfies 0. 

An LTL formula 0 holds in a model M (denoted M |= 0) whenever it holds 
on any path starting from an initial state of M. We say that a formula 0 is valid 
(1= 0) whenever it holds in any model. 



Compositional Refinement in LTL. The refinement property in this frame- 
work is defined as trace inclusion, where trace is an execution path tt of a partic- 
ular module A starting from some initial state of A. Recall that in the previous 
section, we left || and A undefined and only require that they satisfy certain 
algebraic properties. In this section, we define these operator in a non-standard 
way: over LTL formulas. If two modules A and A' satisfy LTL properties 0 a and 
(j)A' respectively, then A' A A is defined by the following formula: 

0A' ^ 0a, 

where u is the vector of all the state variables of (j^A which do not occur in 4>A' • 
Parallel composition of two modules A||i? is simply a conjunction of their 
respective properties: (j)A/\4>B. The fact that a composition of modules satisfies 
some property 0 can now be derived by 

0A A 0B ^ 0. (3) 

These definitions of || and A clearly satisfy the required algebraic properties 
stated in section 2.1. Effectively, the actual modules are replaced by their LTL 
formulas, and all the compositional reasoning is now done completely in LTL. 
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Interface Refinement in LTL. Our approach to compositional interface re- 
finement translates to the LTL framework as follows. Suppose that the properties 
X \= ({)x and Y \= <j)Y have already been verified, and the desired property of 
their composition X\\Y ^ cj) follows from the fact that 

\= 4>X /\4>y ^ (4) 

Additionally, we assume that the concrete modules satisfy their respective 
properties: 



X' ^4>X' Y' (5) 

Recall in section 2.1, in order to show that the concrete design also satisfies (j), 
that is, A'||F' |= (/), it is sufficient to show that C\\Y' ^ Y and X' ^ AljC, 
which translates to LTL as follows: 

4>C A 4>y' 4>y (6) 

(fix' 3®c. (j)x A (t>c (7) 

where (j)c is some formula representing the converter module C. Equation (6) 
does not have the existential quantifier because all the free variables in (jiy appear 
on the left hand side of the implication. Note, that there are no restrictions on the 
formula (j)c- In particular, the actual converter C does not have to be built (and 
may, in fact, be unimplementable), and C \= (j)c does not need to be checked. 

3 Examples and Applications 

In this section, we first demonstrate the practical application of our method on 
a very simple example: composition of two modules passing an L-bit value from 
input to output over an L-bit wide bus, and its refinement to the two modules 
which communicate through a serial bus (1-bit wide). Then we describe a more 
realistic example of an actual design in the context of the Smart Memory Project. 

3.1 A Simple Example for Illustration 

Suppose that we have two modules X and Y that pass the input from a to 
c and from c to 6 as shown in figure (3a), modifying the data with functions fx 
and /y; that is, given the input D, the corresponding output will be fyifxiD)). 
Signal c is a parallel interface that is L bits wide, and is represented as a record 

c : {valid : bool, data : bool[L]} . 

The transmission of data occurs when the valid bit is set to 1, and the data field 
is supposed to contain the valid data in the same cycle. The property that the 
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(a) parallel interface 



(b) serial interface 



Fig. 3. Systems with parallel and serial interfaces 



data D : bool[L] is sent over the channel c at the current cycle is expressed in 
LTL as follows: 



c =1 H = [c. valid = 1 A c.data = D] . 

The property of the composed system is then written as follows: 

0 ^ V(i^ : bool[L]). G (a =1 ^ F [6 =1 fyi/xiD))]). 

Suppose that the abstract components X and Y satisfy the properties (f)x and 
(j)Y, respectively, where 

= VZ9. G (a =1 D ^ F [c ^ fx{D)]) 
cj)Y=yD.G{c^D^F[b^ fY{D)]), 

and we have already established that 

4>X /\4'y ^ (t>- ( 8 ) 

The concrete implementation of the system consists of two components X' 
and Y' which communicate over a serial bus c' as shown in figure (3b). 

A serial interface c' is a record, similar to the concrete interface, only the 
data field is one bit wide: 

c' : {first_bit : bool, bit : bool} . 

A valid transmission of an L-bit data D over the serial bus d starts when the 
first_bit field becomes 1; at this point, the bit field contain the first bit of the 
data, £)[0], and in the subsequent L — I cycles first_bit remains 0 while the 
bit field takes the values of the remaining bits D[l] through D[L — V\. Such a 
transmission is expressed in LTL as the following formula: 

c D = c'.first_bit = 1 A bit = £)[0] 

A X*(c'.first_bit = 0 A c'.bit = D[i]), 
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where is an abbreviation for X The properties of the concrete 

i 

modules can now be written as follows: 

L-l 

(!>x' ^'iD.Q{a^D^F[c' ^ fx{D) A /\ X^X'.d = fx{D)]) 

i^O 

(j)Y' = VZJ. G (c' ZJ ^ F [6 =1 fviD)]). 

Here X'.d is the internal buffer of X' holding the data currently being transmit- 
ted on its output bus c'. The formula <j)x' states that X' will eventually start 
sending the data it receives on the input a, and while the transmission is active, 
the buffer X' .d will hold the entire data being transmitted. In this case, we would 
like to establish that 

4>X' A 4>y' 4>- (9) 

However, we do not want to prove this formula directly, and instead, use the 
already proved property (8). As the first step, we come up with the following 
property: 

(/-c = VZJ. G (c =1 ^ c D) 

which a protocol converter C would satisfy if we were to build it. Note, that the 
converter has no delay between receiving the parallel data and starting to send 
the same data over the serial output. As such, this module is unimplementable, 
since the parallel input may exceed the data rate of the serial output. However, 
this property is sufficient to prove the correctness of the refinement. 

According to our methodology, in order to establish (9), it is sufficient to 
show that the following two formulas hold: 

(j)C A 4>Y' (t)Y (10) 

<t)x' 3c. (fix A 4>c- (11) 

These two refinements are illustrated in figures 4(a) and 4(b). In (11), the 
parallel composition X\\C has a nondeterministic state variable c, which is the 
state of the internal wire between X and C. In order to show that X' is a 
refinement of X\\C, we need to show that for any trace of X' there exists an 
equivalent trace of AUC. A particular trace of AUC can be chosen by a particular 
sequence of values for the variable c. This fact is denoted by the existential 
quantifier over c. When we provide an instantiation for c as 

c = {valid = chfirst_bit, data = X' .d] , 

then it will yield the equivalent trace of X\\C for any trace of X' . Note, that 
this instantiation also prevent overflowing the converter C with excessive data 
on the input, since it can only receive the data as fast as X' can send it, which 
is exactly the rate of the serial bus. From this point, proving (10) and (11) is a 
relatively simple exercise. 
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Fig. 4. Refinement of X and Y. 



3.2 Application to Smart Memory Network Controller 

The Smart Memory project aims at developing a single chip multi processor 
system with reconfiguration capabilities to support various computing models. 
An on-chip network is required in this project to allow different sets of processors 
on the chip to communicate with each other. 

When the design of the Smart Memory network controller was started, it 
was clear that getting it right would be a quite difficult task due to the complex 
interactions among multiple nodes in the network. Therefore, we decided to 
use formal verification in the design process to guarantee the correctness from 
the early stages, and preserve it as the design evolves. The practical demands 
of this engineering effort required the use of effective compositional reasoning 
together with interface refinement techniques, which eventually led to the results 
presented in this paper. 



Smart Memory Network description. The Smart Memory chip is divided 
up into 16 nodes. The Smart Memory network passes short control messages as 
well as large chunks of data between these nodes. The topology of the network 
is such that each node connects to its four neighbors as well as the nodes that 
are two spaces away. Thus each quad has six input and output ports as shown 
in figure 5. Since the messages are of different sizes, we split the messages into 
smaller fixed length data unit called flits which are then transmitted over the 
network. To meet the latency constraints, all the routing information is placed in 
the header flit, and the data flits follow the same route. The route is determined 
dynamically by allocating physical connections, which constitute a limited shared 
resource. Since messages are sent in parallel, it is possible for the system to 
deadlock when two streams of flits are waiting for each other’s resources. 



Problem Statement. To formally verify the correctness of the network, we 
need to prove the guaranteed delivery of messages to the correct destination 
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Network Topology 




Fig. 5. 



without duplication or loss. Proving that the routing algorithm correctly routes 
messages is already an interesting verification problem. The fact that messages 
are not atomic but consist of individual flits makes the verification problem even 
more difficult. 



High level description of the interface refinement process. To simplify 
the verification process, we generalize the network to have arbitrary number of 
nodes and a parameterized connection scheme with certain restrictions on the 
topology (e.g. each input port is connected an output port, there is a route 
between every two nodes, etc.). At the abstract level, we describe the messages 
delivered by the network in terms of packets. The specification of each node 
acting on packets are specified by their properties in the assume-guarantee style. 
These restriction and specifications are easily encoded in LTL. We prove the 
correctness of packet routing given the node specifications without having to 
worry about the complexity of breaking up the packet into flits. 

Then we refine the interface such that the communication between nodes are 
flits instead of packets. The correctness of the refined interface are checked by 
using our tool to process the LTL specifications and converted into verification 
conditions (7) and (10) which are then proved by Cadence SMV. The problem 
of refining the message interface to the flit interface is similar in spirit as the 
example illustrated in section 3.1. However, the fact that each node is connected 
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to many other nodes, the message length can be of different size, and the presence 
of a flow control mechanism make the problem much more complex. 

We then perform another refinement of the interface by adding the flow 
control to the flit interface using the same methodology. 

We estimate that the the first refinement step took us about 10 hours, and 
about 10 hours were spent on expanding the existing toolset, although this does 
not include the time spent in our preliminary work with a similar simpler example 
to develop this methodology. The verification run performed by SMV for each 
verification condition our tool generate all finish within a minute. 



4 Conclusion 

We have demonstrated that interface refinement allows us to reuse the correct- 
ness proof of a design as the designer makes changes to the interface or makes the 
interface more concrete during the design cycle. Interface refinement also allows 
us to apply the divide-and-conquer technique to a large design by incrementally 
making the protocol more complex. 

By applying our technique to the Smart Memory network controller during 
the design process, we have demonstrated that our interface refinement method- 
ology can be used to allow formal verification to be done in parallel with design. 
Currently, the Smart Memory Network controller is still in the design stage and 
no concrete design have been written, therefore we have not made the link be- 
tween the Hardware Description Language (HDL) description of the design with 
the LTL specification we have used for the modules. However, SMV is known for 
proving the LTL specification of the HDL description [ 10 , 17 ], and we foresee 
that it is possible to verify the HDL description of the network controller. 

In addition to the method presented in this paper, we developed additional 
techniques specific to our toolset to perform the interface refinement of the net- 
work controller presented in this paper. These techniques allow us to describe 
the specifications in a more intuitive manner and deal with the more complex 
configuration such as a circular configuration. 

The interface refinement methodology is being developed as a part of the 
larger project to formally verify the Smart Memory network controller, showing 
that formal verification can be applied to a large engineering design. In the fu- 
ture, we hope to demonstrate the feasibility of formally verifying a large design 
project by integrating verification into the design process. We believe that devel- 
oping a methodology with the feedback and the pressure of the design team will 
allow us to address the important issues that are hindering formal verification 
from being commonly used. 

Acknowledgements. The authors would like to thank Mark Horowitz and the 
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also like to thank the anonymous reviewers for helpful comments on the paper. 
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Abstract. Predicate abstraction has been proved effective for verifying several 
infinite-state systems. In predicate abstraction, an abstract system is automati- 
cally constructed given a set of predicates. Predicate abstraction coupled with 
automatic predicate discovery provides for a completely automatic verification 
scheme. For systems with unbounded integer state variables (e.g. software), coun- 
terexample guided predicate discovery has been successful in identifying the nec- 
essary predicates. 

For verifying systems with function state variables, which include systems with 
unbounded memories (microprocessors), arrays in programs, and parameterized 
systems, an extension to predicate abstraction has been suggested which uses 
predicates with free (index) variables. Unfortunately, counterexample guided 
predicate discovery is not applicable to this method. In this paper, we propose 
a simple heuristic for discovering indexed predicates. We illustrate the effective- 
ness of the approach for verifying safety properties of two systems: (i) a version 
of the Bakery mutual exclusion protocol, and (ii) a directory-based cache coher- 
ence protocol with unbounded FIFO channels per client. 



1 Introduction 

Predicate abstraction [15] has emerged as a successful technique for analyzing infinite- 
state systems. The infinite-state systems consist of both hardware and software sys- 
tems, where the state variables can assume arbitrarily large sets of values. Predicate 
abstraction, which is a special instance of the more general theory of abstract inter- 
pretation [9], automatically constructs a finite-state abstract system from a potentially 
infinite state concrete system, given a set of predicates (where a predicate describes 
some property of the concrete system). The abstract system can be used to synthesize 
inductive invariants or perform model checking to verify properties of the concrete sys- 
tem. 

For synthesizing inductive invariants, predicate abstraction can be viewed as a sys- 
tematic way to compose a set of predicates V using the Boolean connectives (A, V, to 
construct the strongest inductive invariant that can be expressed with these predicates. 
This process can be made efficient by using symbolic and Boolean techniques based 
on incremental SAT and BDD-based algorithms [21,7]. Thus, predicate abstraction can 
construct complex invariants given a set of predicates. 

* This research was supported in part by the Semiconductor Research Corporation, Contract 
RID 1029.001. 
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For systems which do not require quantified invariants, it suffices to use simple 
atomic predicates (predicates do not contain V, A or ^). The simplicity of the pred- 
icates make them amenable to automatic predicate discovery schemes [2,6,17]. All 
these methods use the framework of counterexample-guided abstraction refinement [19, 
8] to add new predicates which eliminate spurious counterexample traces over the ab- 
stract system. Automatic predicate discovery coupled with the automatic abstraction 
provided by predicate abstraction makes the verihcation process fully automatic. This 
has been the cornerstone of many successful verification systems based on predicate 
abstraction [2,6, 17]. 

To verify systems containing unbounded resources, such as buffers and memories of 
arbitrary size and systems with arbitrary number of identical, concurrent processes, the 
system model must support state variables that are mutable functions or predicates [25, 
10, 5]. For example, a memory can be represented as a function mapping an address to 
the data stored at an address, while a buffer can be represented as a function mapping 
an integer index to the value stored at the specihed buffer position. The state elements 
of a set of identical processes can be modeled as functions mapping an integer process 
identiher to the state element for the specified process. 

To verify systems with function state variables, we require quantified predicates 
to describe global properties of state variables, such as “At most one process is in its 
critical section,” as expressed by the formula Vz,j : crit(i) A crit(j) z = j. Con- 
ventional predicate abstraction restricts the scope of a quantifier to within an individual 
predicate. System invariants often involve complex formulas with widely scoped quan- 
tifiers. The scoping restriction (the fact that quantifiers do not distribute over Boolean 
connectives) implies that these invariants cannot be divided into small, simple predi- 
cates. This puts a heavy burden on the user to supply predicates that encode intricate 
sets of properties about the system. Recent work attempts to discover quantihed predi- 
cates automatically [10], but it has not been successful for many of the systems that we 
consider. 

Our earlier work [21, 20] and the work by Flanagan and Qadeer (in the context of 
unbounded arrays in software) [13] overcome this problem by allowing the predicates 
to include free variables from a set of index variables X. We call these predicates as 
indexed predicates. The predicate abstraction engine constructs a formula -tf;* consisting 
of a Boolean combination of these predicates, such that the formula yXip* (s) holds for 
every reachable system state s. With this method, the predicates can be very simple, 
with the predicate abstraction tool constructing complex, quantified invariant formulas. 
For example, the property that at most one process can be in its critical section could be 
derived by supplying predicates crit(i), crit(j), and i = j, where i and j are the 
index variables. 

One of the consequences of adding indexed predicates is that the state space defined 
over the predicates does not have a transition relation [20]. This is a consequence of the 
fact that the abstraction function a maps each concrete state to a set of abstract states, 
instead of a single abstract state as happens in predicate abstraction [15]. The lack of an 
abstract transition relation prevents us from generating an abstract trace and thus rules 
out the counterexample-guided rehnement framework. 
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In this work, we look at a technique to generate the set of predicates iteratively. 
Our idea is based on generating predicates by computing the weakest liberal precondi- 
tion [11], similar to Namjoshi and Kurshan [27] and Lakhnech et al. [23]. Our method 
differs from [27] in that we simply use the technique as a heuristic for discovering 
useful indexed predicates. We rely on predicate abstraction to construct invariants us- 
ing these predicates. The method in [27] proposed computing the abstract transition 
relation on-the-fly using the weakest precondition. The methods in [23, 6] can be seen 
as generating new (quantifier-free) predicates using the counterexample-guided refine- 
ment framework with some acceleration techniques in [23]. 

The techniques have been integrated in UCLID [5] verifier, which supports a variety 
of different modeling and verification techniques for infinite- state systems. We describe 
the use of the predicate inference scheme for verifying the safety properties of two 
protocols: (i) A version of the N-process Bakery algorithm by Lamport [24], where 
the reads and writes are atomic and (ii) A extension of the cache-coherence protocol 
devised by Steven German of IBM [14], where each client communicates to the central 
process using unbounded channels. The protocols were previously verified by manually 
constructing predicates in [20]. In contrast, in this work the protocols are verified almost 
automatically with minimal intervention from the user. 



Related Work. The method of invisible invariants [28,1] uses heuristics for con- 
structing universally quantified invariants for parameterized systems automatically. The 
method computes the set of reachable states for finite (and small) instances of the pa- 
rameters and then generalizes them to parameterized systems to construct the inductive 
invariant. The method has been successfully used to verify German’s protocol with sin- 
gle entry channels and a version of the Bakery algorithm, where all the tickets have 
an upper bound and a loop is abstracted with an atomic test. However, the class of 
system handled by the method is restricted; it can’t be applied to the extension of the 
cache-coherence protocol we consider in this paper or an out-of-order processor that is 
considered in our method [22]. 

McMillan uses compositional model checking [25] with various built in abstrac- 
tions and symmetry reduction to reduce an infinite- state system to a finite state version, 
which can be model checked using Boolean methods. Since the abstraction mechanisms 
are built into the system, they can often be very coarse and may not suffice for proving 
a system. Besides, the user is often required to provide auxiliary lemmas or to decom- 
pose the proof to be discharged by symbolic model checkers. The proof of safety of 
the Bakery protocol required non-trivial lemmas in the compositional model checking 
framework [26]. 

Regular model checking [18,4] uses regular languages to represent parameterized 
systems and computes the closure for the regular relations to construct the reachable 
state space. In general, the method is not guaranteed to be complete and requires various 
acceleration techniques (sometimes guided by the user) to ensure termination. More- 
over, several examples can’t be modeled in this framework; the out-of-order processor 
or the Peterson’s mutual exclusion (which can be modeled in our framework) are few 
such examples. Even though the Bakery algorithm can be verified in this framework, it 
requires user ingenuity to encode the protocol in a regular language. 
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Emerson and Kahlon [12] have verified the version of German’s cache coherence 
protocol with single entry channels hy reducing it to a snoopy protocol, which can in 
turn he verified automatically by considering finite instances of the parameterized prob- 
lem. However, the reduction is manually performed and exploits details of operation of 
the protocol, and thus requires user ingenuity. It can’t be easily extended to verify other 
unbounded systems including the Bakery algorithm or the out-of-order processors. 

Predicate abstraction with locally quantified predicates [10,3] require complex 
quantified predicates to construct the inductive assertions, as mentioned in the intro- 
duction. These predicates are often as complex as invariants themselves. The method 
in [3] verified (both safety and liveness) a version of the cache coherence protocol with 
single entry channels, with complex manually provided predicates. In comparison, our 
method constructs an inductive invariant automatically to prove cache coherence. Till 
date, automatic predicate discovery methods for quantified predicates [ 10 ] have not 
been demonstrated on the examples we consider in this paper. 

2 Preliminaries 

The concrete system is defined in terms of a decidable subset of first-order logic. Our 
implementation is based on the CLU logic [5], supporting expressions containing unin- 
terpreted functions and predicates, equality and ordering tests, and addition by integer 
constants. The logic supports Booleans, integers, functions mapping integers to inte- 
gers, and predicates mapping integers to Booleans. 

2.1 Notation 

Rather than using the common indexed vector notation to represent collections of values 
(e.g., V = (ui, f 2 , ■ • ■ , fn)), we use a named set notation. That is, for a set of symbols 
A, we let V indicate a set consisting of a value Vj. for each tl ^ A. 

For a set of symbols A, let denote an interpretation of these symbols, assigning 
to each symbol x G ^ a value o>i(x) of the appropriate type (Boolean, integer, function, 
or predicate). Let I 7_4 denote the set of all interpretations 04 over the symbol set A. Let 
T 4 • OB be the result of combining interpretations and erg over disjoint set of symbols 
A and B. 

For symbol set A, let E(A) denote the set of all expressions in the logic over A. For 
any expression e G E(A) and interpretation 014 G 27_4, let (e)^.^ be the value obtained 
by evaluating e when each symbol x G ^ is replaced by its interpretation o>i(x). For a 
set of expressions v, such that G E{B), we extend (v)^^ to denote the (named) set 
of values obtained by applying erg to each element Vy_ of the set. 

A substitution tt for a set of symbols ^ is a named set of expressions, such that for 
each X G -4, there is an expression tTx in tt. For an expression e we let e denote 

the expression resulting when we (simultaneously) replace each occurrence of every 
symbol x G ^ with the expression tTx. 

2.2 System Description and Concrete System 

We model the system as having a number of state elements, where each state element 
may be a Boolean or integer value, or a function or predicate. We use symbolic names 
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to represent the different state elements giving the set of state symbols V. We introduce 
a set of initial state symbols and a set of input symbols X representing, respectively, 
initial values and inputs that can be set to arbitrary values on each step of operation. 
Among the state variables, there can be immutable values expressing the behavior of 
functional units, such as ALUs, and system parameters such as the total number of 
processes or the maximum size of a buffer. 

The overall system operation is described by an initial-state expression set q° and 
a next-state expression set S. That is, for each state element x G V, the expressions 

G E{J) and 6^. G i?(V U I) denote the initial state expression and the next state 
expression for x. 

A concrete system state assigns an interpretation to every state symbol. The set 
of states of the concrete system is given by Sv, the set of interpretations of the state 
element symbols. For convenience, we denote concrete states using letters s and t rather 
than the more formal oy. 

From our system model, we can characterize the behavior of the concrete system 
in terms of an initial state set Qc — and a next-state function operating on sets 
Nq'. The initial state set is defined as = {(q°)^^ \(Jj G 

Ej}, i.e., the set of all possible valuations of the initial state expressions. The next-state 
function Nq is defined for a single state s as Nc{s) = {{S) |ox G L'l}, i.e., the set 
of all valuations of the next-state expressions for concrete state s and arbitrary input. 
The function is then extended to sets of states by defining Nc{Sc) = UseSc Nc{s). 
We define the set of reachable states Rc as containing those states s such that there is 
some state sequence so, si, . . . , s„ with sq G Q^, s„ = s, and s*+i G Nc{si) for all 
values of i such that 0 < z < n. 



3 Predicate Abstraction with Indexed Predicates 



We use indexed predicates to express constraints on the system state. To define the ab- 
stract state space, we introduce a set of predicate symbols V and a set of index symbols 
X. The predicates consist of a named set <p, where for each p G V, predicate 0p is a 
Boolean formula over the symbols in V U A”. 

Our predicates define an abstract state space S-p, consisting of all interpretations 
Op of the predicate symbols. For k = \V\, the state space contains 2^ elements. We can 
denote a set of abstract states by a Boolean formula ip G E{V). This expression defines 
a set of states {'ijj) = {ap \ = true}. 

We define the abstraction function a to map each concrete state to the set of abstract 
states given by the valuations of the predicates for all possible values of the index vari- 
ables: a(s) = |cTr G 27 A”}- We then extend the abstraction function to apply 

to sets of concrete states in the usual way: a(S'c) = UsgSc 

We define the concretization function 7 for a set of abstract states Sa Q Up: 
7 (S'yi) = {s|V(j;r G Ex ■ {4>)s-crx ^ >5'a}, to require universal quantification over the 
index symbols. 

The universal quantifier in this definition has the consequence that the concretiza- 
tion function does not distribute over set union. In particular, we cannot view the con- 
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cretization function as operating on individual abstract states, but rather as generating 
each concrete state from multiple abstract states. 

Predicate abstraction involves performing a reachability analysis over the abstract 
state space, where on each step we concretize the abstract state set via 7, apply the 
concrete next-state function, and then abstract the results via a. We can view this pro- 
cess as performing reachability analysis on an abstract system having initial state set 
= cr(Qc) ^ next-state function operating on sets: Na{Sa) = a{Nc{'y{SA)))- 
It is important to note that there is no transition relation associated with this next- 
state function, since 7 cannot be viewed as operating on individual abstract states. In 
previous work [20], we provide examples where a pair of abstract states Sa and s'^ 
each has an empty set of abstract successors, but the set of successors of {sa,s[j} is 
non-empty. 

We perform reachability analysis on the abstract system using Na as the next-state 
function: and R^a^ = R\ U Na{Ra)- Since the abstract system is finite, 

there must be some n such that R\ = R'X^^- The set of all reachable abstract states 
Ra is then R^. Let pA G E{V) be the expression representing Ra- The corresponding 
set of concrete states is given by '){Ra), and can be represented by the expression 
: Pa [4>/V]. In previous work [20], we showed that the concretization of Ra (or 
equivalently : pA \4>/V]) is the strongest universally quantified inductive invariant 
that can be constructed from the set of predicates. 

Since there is no complete way for handling quantifiers in first order logic with 
equality and uninterpreted functions [16], we resort to sound quantifier instantiation 
techniques to compute an overapproximation of the abstract state space. The quantifier 
instantiation method uses heuristics to choose a finite set of terms from the possible 
infinite range of values and replaces the universal quantifier by a finite conjunction 
over the terms. More details of the technique can be found in [20] and details of the 
quantifier instantiation heuristic can be found in [22]. The method has sufficed for all 
the examples we have seen so far. The predicate abstraction is carried out efficiently by 
using Boolean techniques [21]. 

The inductive invariant is then used to prove the property of interest by the decision 
procedure inside UCLID. If the assertion holds, then the property is proved. We have 
used this method to verify safety properties of cache-coherence protocols, mutual exclu- 
sion algorithms, out-of-order microprocessors and sequential software programs with 
unbounded arrays [20]. However, most predicates were derived manually by looking at 
failures or adding predicates that appear in the transition function. 



4 Indexed Predicate Discovery 

This section presents a syntactic method for generating indexed predicates, based on 
weakest liberal precondition [11] transformer. A similar idea has been used in [23], 
but they do not consider indexed predicates. As a result, they can use methods based 
on analyzing abstract counterexample traces to refine the set of predicates. In our case, 
we only use it as a syntactic heuristic for generating new predicates. An inexpensive 
syntactic heuristic is also more suited to our approach since computing the inductive in- 
variant is an expensive process [20] for large number of predicates (> 30), even with the 
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recent advances in symbolic methods for predicate abstraction [21]. More importantly, 
the simple heuristic has sufficed for automating the verification of non-trivial problems. 

The weakest precondition of a set of states Sc is the largest set of states Tc, such 
that for any state tc G Tq, the successor states lie in Sc- If 'I'c is an expression rep- 
resenting the set of states Sc, then the expression which represents the WP(ifc') is 
VI : ^c [<i /V] . To obtain this expression in terms of the state variables, one would have 
to perform quantifier elimination to eliminate the input symbols I. In general, elim- 
inating quantifiers over integer symbols in the presence of uninterpreted functions in 
'Pc [<5/V] is undecidable [16]. 

Let us see the intuition (without any rigorous formal basis, since its only a heuristic) 
for using WP for predicate discovery. Consider a predicate (p without any index vari- 
ables. A predicate represents a property of the concrete system, since it is a Boolean 
formula over the state variables. Thus, if WP(</>) (WP(^</>)) is true at a state, then p (re- 
spectively ^(p) will be true in the next state. Therefore the predicates which appear in 
WP((()) are important when tracking the truth of the predicate p accurately. Since com- 
puting WP((()) as a predicate over V is undecidahle in general, we choose predicates 
from Pc [< 5 /V] without explicitly eliminating the quantifiers over I. We later provide 
a strategy to deal with predicates which involve input symbols. This intuition can be 
naturally extended to indexed predicates. In this case, our aim is to generate predicates 
which involve the index symbols. For a predicate p over VUX, the predicates in p [6/V] 
involve X and is a good source for mining additional indexed predicates. 

We start with the set of predicates in the property to be proved. If the final prop- 
erty to be proved is : P{V , X), snq extract the indexed predicates that appear in 
P{V, X). At each predicate discovery step, we generate new predicates from the weak- 
est precondition of the existing predicates. An inductive invariant over the combined set 
of predicates is constructed by predicate abstraction. If the invariant implies the prop- 
erty, we are done. Otherwise, we iterate the process. This process can be repeated until 
no more predicates are discovered or we exhaust resources. 

There are several enhancements over the simple idea that were required to generate 
meaningful predicates. The problems encountered and our solutions are as follows: 

If-then-Else Constructs. To generate atomic predicates, the if-then-else (ITE) con- 
structs are eliminated from the WP expression by two rewrite rules. First, we distribute a 
function application over an ITE term to both the branches i.e. f{ITE{G, Ti, Ei)) — > 
ITE{G, f{Ti), f{Ei)). Second, we distribute the comparisons over ITE to both the 
branches, i.e. ITE{Gi, Ti, Ei) ixi T2 — > (Gi A Ti ixi T2) V (^Gi AEitxi T2), where 
t<ie {<, =}. Since the only arithmetic supported in our modeling formalism is addition 
by constants [5], the final predicates (after all the rewrites are applied) are of the form 
Ti [XI l2 + c, where Ti is an integer symbol or a function application. 

Input Symbols. Since predicates relate state variables, input symbols in I should not be 
considered as part of predicates (the input symbols are universally quantified out while 
computing the WP). Boolean valued input variables can be safely ignored. However, 
integer input variables are different. 

The principal source of integer inputs is the arbitrary index that is generated for 
choosing a process or an instruction to execute non-deterministically. Ignoring predi- 
cates containing inputs can often result in loosing useful predicates. Consider a frag- 
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ment of a code for modeling a simple protocol in UCLID, where pc is a function state 
variable, representing the state of each process and cid is an arbitrary process identifier 
(input) generated to at each step. 

next [pc] := (* next state of 'pc' state variable *) 

Lambda (i) . case (* next state for the ith index *) 

i != cid ; pc(i) ; (* if i != cid, remain unchanged *) 

pc (cid) = A ; B ; (* else if pc(i) = A, update to B *) 

pc (cid) = B ; C ; (* else if pc(i) = B, update to C *) 

default ; pc(i) ; (* else remain unchanged *) 

esac ; 

Let us assume that we are generating the set of predicates for pc(x) = C, where x G 
X. The set of predicates generated from this code fragment is {x = cid, pc(cid) = 
A, . . . , Ignoring predicates containing cid returns an empty set of new predicates. 

To circumvent this problem, we check if an input variable inp appears in any predi- 
cate inp = X, where x G X. In such a case, we say x is a match for inp. We repeat this 
for each input symbol in I. We can have multiple input variables to index into a multi- 
dimensional array. Let X^ be the named set of index variables, such that X^ represents 
the match for input a. 

We first ignore predicates that contain input variables without any matches. For 
any other predicate (j), we generate the predicate (j) \X^ /l\. For the above example, 
the predicates generated are: {a; = x, pc(a;) = A , . . . which generates important 
predicates. Trivial predicates such as x = x are ignored. 

Arithmetic Predicates. In the presence of even simple arithmetic in the model, e.g. 

= vi -f 1 , predicate abstraction generates too many predicates, often generating 
towers such as v [xi V2 -I- 1 ,. . . , v [xi V2 -I- A:. Most often these predicates are not essential 
for the property to be proved. 

To prevent such predicates, we only generate predicates of the form Ti = T2 or 
Ti < T2, where Ti is a integer state variable or a function application. The first time we 
see a predicate T\ txiT2 + c (where c ^ 0 ), we generate the predicates T\ = T2 and 
Ti <T2- We ignore the predicate T\ ixi T2 -I- c for the next step of predicate generation. 
This step is automatically performed. If predicates involving +c are required, the user is 
expected to provide them. This step can be seen as a lightweight acceleration approach. 

Nested Function Applications. While generating predicates for a given index variable, 
say i, we add new index variables to be placeholders for nested function applications. 
For instance, if the predicate discovered is F(G(i)) = Ti, where F and G are state vari- 
ables, we introduce a new index variable j, add the predicate j = G(i) and replace the 
predicate (with the nested function application) with a new predicate F(j) = Ti. All 
nested occurrences of G(z) are replaced with j for subsequent predicates. At present, 
the user determines if new index variables have to be added, as the addition of too many 
index variables often overwhelm the predicate abstraction engine. We are currently au- 
tomating this step. 

Generalizing from a Fixed Index. Suppose we have a state variable v and we have 
a predicate that involves P(v), (where P is a function or predicate state variable). If at 
some point in the future, we see the predicate x = v, where x G X, then we rewrite all 
the predicates which contains v as an argument to a function or predicate state variable 




Indexed Predicate Discovery for Unbounded System Verification 143 

by substituting x for v. For example, the predicate P(v) = 5 is rewritten as P(a;) = 
5. This rule has the effect of generalizing a predicate. It also removes the predicate 
P(v) = 5, since it can be constructed as a combination of a: = v and P(a;) = 5. We have 
found this rule crucial for generating important predicates and also keep the size of the 
predicate set small. 



5 Case Studies 

5.1 Bakery Protocol 

In this section, we describe the verification of mutual exclusion property for a version of 
N-process Bakery algorithm proposed by Lamport [24]. For this verification, we only 
model atomic reads and writes. A version with non-atomic reads and writes has been 
verified with manually supplied predicates [20]. 

Each process has a Boolean variable choosing, a ticket number and a loop index 
j . Each process can be in one of 6 program locations {Lq, . . . A program counter 
variable pc holds the current program location a process is in. All the statements for a 
particular program location are executed atomically. The test (ai , a 2 ) < (bi , 02 ) stands 
for ai < bi V ai = bi A a 2 < b 2 . 

The variables are initialized as: 

choosing := Lambda i. false ; j := Lambda i. 0 ; 
number := Lambda i. 0 ; pc ;= Lambda i. LO ; 

and the overall protocol for the process at index i is described in the following pseudo 
code: 



LO 


: choosing [i] := true; goto LI ; 




LI 


: number [i] ;= max (number [0] number [N-1] ) 

choosingli] := false; j [i] ;= 0; goto L2 ; 


+ 1; 


L2 


: if (! choosing (j [i] ) ) then (goto L3 ; } 
else (goto L2 ; } 




L3 


: if (number (j [i] ) =0 | | 






(number [i] , i) <= (number [j [i] ], j ) ) then 
else (goto L3 ; } 


(goto L4 ; } 


L4 


: if (j [i] < N) then (j[i] ;= j[i]+l; goto L2 ; } 

else (goto L5 ; } 


L5 


: Critical-Section; number [i] := 0 ; goto LO; 





We model the computation of the maximum value of number(O), . . . , 
number(N — 1), in program location Li as an atomic operation. This is modeled by 
introducing a state variable max which takes on arbitrary values at each step, and an 
axiom that says: Vi : max > number(z). At each step, at most one process (at index 
cid) is scheduled to execute. 



Proving Mutual Exclusion. The property we want to prove is that of mutual exclusion, 
namely Vi, j : pc(i) = L5 A pc(j) = L5 (i = j). We start out with a single index 
variable i. We start with the predicates in the property, namely {pc(i) = L5}. 
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The predicates discovered after the first round are: = { j(i) = N, N < j(z), 

pc(z) = L 4 , i = j(i), i < j(i), I = j(*), number(i) = number(l), number(i) < 
number(l), number(l) = 0 , pc(i) = L 3 , choosing(l), pc(i) = L 2 , pc(i) = Li, 
pc(z) = Lq, f = N, N < z, z < 0 }. We introduced a second index symbol I when 
the predicate number(z) = number( j (z)) with nested function application was encoun- 
tered. We replaced this predicate with {I = j (z), number(z) = number(Z)}. Similarly 
the predicates number(z) = number(j (z)) and choosing(j (z)) were replaced with 
number(z) = number(/) and choosing(Z) respectively. We ignored one redundant 
predicate j (z) > N, since both j (z) = N and j (z) < N are present. We also encountered 
the predicate z = cid, and thus we will replace all future occurrences of cid with z in 
subsequent predicates. It took 0.81 seconds to discover that the initial set of predicates 
was not sufficient and to discover these predicates. 

It took 55.8 seconds to find a counterexample with this set of predicates. The next 
set of predicates generated are: = { i = 0, I = 0, I < j(z), number(z) = 0, 

pc(^) = L 5 , pc(/) = Li, ^ = z, / = N, N < ?, / < 0, number(z) < 0, 0 < number(l), 
pc(/) = Lq, 0 < N, N < 0 }. We filtered out quite a number of predicates using 
our rules. First the predicate j(z) + 1 = N gave rise to {j(z) = N, j(z) < N}, both 
of which are already present. Similarly j (z) + 1 < N did not give rise to any new 
predicates. Likewise, the predicates generated from i = j (z) + 1 are also present in 
P^. We ignored predicates which involve max since it is merely a input. The predicate 
I < j{i) was generated from I = j (z) + 1 (Z = j (z) is already present). Finally I = i 
was generated from I = cid by substituting z for cid in the predicate. All the filtering 
happened automatically inside the tool, we describe them to point out the effectiveness 
of the various rules. 

With this set of 33 predicates, we were able to derive an inductive invariant in 471 
seconds and 18 iterations of the abstract reachability. The inductive invariant implies 
the mutual exclusion property. 

5.2 Directory-Based Cache Coherence Protocol 

For the directory-based German’s cache-coherence protocol, an unbounded number of 
clients (cache), communicate with a central home process to gain exclusive or shared 
access to a memory line. The state of each cache can be {invalid, shared, exclusive}. 
The home maintains explicit representations of two lists of clients: those sharing the 
cache line (sharer_list) and those for which the home has sent an invalidation request 
but has not received an acknowledgment (invalidate J.ist). 

The client places requests {reqshared, req^exclusive} on a channel ch_l and the 
home grants {grant^hared, grant exclusive} on channel ch_2. The home also sends 
invalidation messages invalidate along ch_2. The home grants exclusive access to a 
client only when there are no clients sharing a line, i.e. Vi : sharer_list(i) = false. 
The home maintains variables for the current client (current_client) and the current 
request (current_command). It also maintains a bit exclusive^ranted to indicate 
that some client has exclusive access. The cache lines acknowledge invalidation re- 
quests with a invalidateMck along another channel ch_3. At each step an input cid is 
generated to denote the process that is chosen at that step. Details of the protocol oper- 
ation with single-entry channels can be found in many previous works including [28]. 
We will refer to this version as german-cache. 
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Since the modeling language of UCLID does not permit explicit quantifiers in the 
system, we model the check for the absence of any sharers Vi : sharer_list(i) = 
false alternately. We maintain a Boolean state variable emptyJisl, which assumes 
an arbitrary value at each step of operation. We then add an axiom to the system: 
empty Jisl Vi : sharer_list(i) = false '. 

In our version of the protocol, each cache communicates to the home process 
through three directed unbounded FIFO channels, namely the channels ch_l,ch_2, 
ch_3. Thus, there are an unbounded number of unbounded channels, three for each 
client^. It can be shown that a client can generate an unbounded number of requests 
before getting a response from the home. We refer to this version of the protocol as 
german-cache-fifo. 

Proving Cache Coherence. We first consider the version german-cache which has been 
widely used in many previous works [28, 12, 3] among others and then consider the ex- 
tended system german-cache-fifo. In both cases, the cache coherence property to prove 
isVi,j : cache(i) =exclusive Ai f j ^ cache(j) =inva/iV. Details of the verification 
of both german-cache and german-cache-fifo can be found at the authors homepage^. 
Here, we briefly sketch the highlights of the experiments. All the experiments are run 
on an 2.1GHz Pentium machine running Linux with 1GB of RAM. Let denote the 
predicates discovered after k iterations of WP computation. 

Invariant Generation for German-Cache. For this version, we derived two inductive 
invariants, one which involves a single process index i and other which involves two 
process indices i and j. For brevity, we will only describe the dual indexed invariant. 

For the dual indexed invariant, we start off with the predicates in the property, 
namely = { pc(i) = exclusive, pc(j) = invalid, i = j }. The predicates discovered 
in subsequent iterations are listed below: 

= { ch2(i) = grantjsxclusive, ch2(i) = grantshared, ch3(i) = empty, ch2(z) = 
invalidate, ch2(j) = grantjexclusive, ch2(j) = grantjhared, ch3(j) = empty, 
ch2(j) = invalidate }. 

P^ = { ch2(i) = empty, current_command = req-exclusive, i = current_client, 
invalidatej.ist(z), j = current_client, current_command = reqshared, 
exclusive_granted, ch3(i) = invalidate Mck, current_command = 
empty, ch2(j) = empty, invalidatej.ist(j), ch3(j) = invalidate Mck}. 

P^ = { sharer_list(f), chl(i) = empty, chl(i) = req^exclusive, chl(z) = 
reqshared, sharer_list(j) }. 

The inductive invariant which implies the cache-coherency was constructed using these 
28 predicates in 1266 seconds using 15 steps of abstract reachability. The entire process 
took less than 2000 seconds of CPU time. 

Invariant Generation for German-Cache-Fifo. The addition of an unbounded number 
of FIFOs increases the complexity of the model. We constructed an inductive invariant 
(which implies cache coherence) with a process index i and an index j for the channels. 

* Our current implementation only handles one direction of the axiom, Vi : emptyjisl => 
sharer_list(i) = false, which is sufficient to ensure the safety property. 

^ The extension was suggested hy Steven German himself. 

^ http ; / /www . ece . emu . edu/~shuvendu/papers/cav04a- submit . ps 
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We needed to add two predicates for the FIFOs manually, which contain constant offsets 
(e.g. X < y — 1). The time taken to construct the inductive invariant with 29 predicates 
was 3 hours. 

6 Conclusions 

In this work, we have demonstrated that predicate abstraction with indexed predicates 
coupled with simple heuristics for indexed predicate discovery can be very effective 
for automated or systematic verification of unbounded systems. The verification is car- 
ried out without knowledge about the operation of any of the protocols. The technique 
has also been applied for the systematic generation of invariants for an out-of-order 
microprocessor engine and a network protocol. 

There is a lot of scope for improving the performance of invariant generation pro- 
cedure by finding a minimal set of predicates. For instance, the inductive invariant for 
german-cache-fifo could be computed using 26 manually specified predicates in 581 
seconds. Hence, there is a scope of almost 20X improvement that can be possibly ob- 
tained by a suitable selection of predicates. We are also experimenting with proof-based 
predicate discovery, where the proof that no concrete counterexample exists within a fi- 
nite number of steps is used to discover new predicates. 
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Abstract. Separation Logic consists of a Boolean combination of predi- 
cates of the form Vi >Vj-\-c where c is a constant and Vi , Vj are variables 
of some ordered infinite type like real or integer. Any equality or in- 
equality can be expressed in this logic. We propose a decision procedure 
for Separation Logic based on allocating small domains (ranges) to the 
formula’s variables that are sufficient for preserving satisfiability. Given 
a Separation Logic formula ip, our procedure constructs the inequalities 
graph of <p, based on p’s predicates. This graph represents an abstraction 
of the formula, as there are many formulas with the same set of predi- 
cates. Our procedure then analyzes this graph and allocates a range to 
each variable that is adequate for all of these formulas. This approach 
of finding small finite ranges and enumerating them symbolically is both 
theoretically and empirically more efficient than methods based on case- 
splitting or reduction to Propositional Logic. Experimental results show 
that the state-space (that is, the number of assignments that need to 
be enumerated) allocated by our procedure is frequently exponentially 
smaller than previous methods. 



1 Introduction 

Separation Logic, also known as Difference Logic, consists of a Boolean combi- 
nation of predicates of the form Vi l> vj + c where l> G {>, >}, c is a constant, 
and Vi,Vj are variables of some ordered infinite type like real or integer. All 
the other equality and inequality relations can be expressed in this logic. Unin- 
terpreted functions can be handled as well since they can be reduced to Boolean 
combinations of equalities [2]. In this paper, we consider Separation Logic with 
integer variables and constants only, assuming that only minor adaptations are 
needed for other cases. Further we consider only predicates with > relations as 
it does not reduce expressivity in any way. 

Separation predicates are used in verification of timed systems, scheduling 
problems, and more. Hardware models with ordered data structures have in- 
equalities as well. For example, if the model contains a queue of unbounded 
length, the test for head < tail introduces inequalities. In fact, as observed by 
Pratt [7], most inequalities in verification conditions are of this form. Further- 
more, since theorem provers can decide mixed theories (by invoking an appropri- 
ate decision procedure for each logic fragment [8]), an efficient decision procedure 
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for Separation Logic will be helpful in verification of any formula that contains 
a significant number of these predicates. 

There are various known methods for solving Separation Logic (see tools such 
as CVC [11] and mathsat [3] that solves this logic), which we survey in detail in 
the full version of this article [12]. Let us just mention here the most recent one, 
which is based on a reduction to Propositional Logic. In [10, 9] such a reduction 
is proposed, based on an analysis of a graph derived from the formula, called the 
inequalities graph. The inequalities graph is based on the formula’s predicates, 
regardless of the Boolean connectives between them. It encodes each predicate 
of the form x > y + c with a new Boolean variable e%y, and then gradually 
removes nodes from the graph while adding transitivity constraints (similar to 
the Fourier-Motzkin technique). The following example illustrates this method. 
Consider the formula ip: x>y+lA{y>z + 2Vz>x+l). As a, first step, 
abstract this formula with Boolean constraints, i.e. (p' : e^y A (e^^ V el^). 

Next, nodes (variables) are eliminated one at a time while adding proper 
constraints to the formula. Given the order x, y, z it first eliminates x, deriving 
from the first and third predicates the new constraint z — l>y + l. It conse- 
quently adds the constraint e]],^ A e],^ e^y. Eliminating y it derives from the 
second and fourth constraint the new unsatisfiable constraint z — 2 > z -I- 2 and 
adds, accordingly, the constraint A e^y false . This procedure may result 
in an exponential number of constraints and a quadratic number of variables. In 
contrast, the Range-Allocation approach which we present now, requires in the 
worst case n • log n variables and no additional constraints. 



Our Approach. Our approach is based on the small model property of Separa- 
tion Logic. That is, if a formula in this theory is satisfiable, then there is a finite 
model that satisfies it. Furthermore, in the case of Separation Logic there exists 
an efficiently computable bound on the size of the smallest satisfying model. This 
implies that the given formula can be decided by checking all possible valuations 
up to that bound. 

In the case of predicates of the form x > y (since we treat integers and weak 
inequalities this is the same as a; > y -I- 1) the range [1 . . . n] is sufficient, where 
n is the number of variables. In other words, it is possible to check for the satis- 
fiability of such formulas by enumerating all n" possible valuations within this 
range. Informally, the reason that this range is sufficient is that every satisfying 
assignment imposes a partial order on the variables, and every assignment that 
preserves this order satisfies the formula as well. Since all orderings are possible 
in the range [1 . . . n], this range is sufficient, or, as we later call it, it is adequate. 
In the case of full Separation Logic (when there are arbitrary constants), it was 
shown [4] that in the worst case a range [1 ... n -I- maxC] is required for each 
variable, where maxC can be as high as the sum of all constants in the formula. 
This result leads to a state-space of {n + maxC)^. These results refer to a uni- 
form range allocation to all variables, regardless of the formula structure. In this 
article we investigate methods for reducing this number, based on an analysis of 
the formula’s structure, which typically results in non-uniform range allocation. 
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A similar approach was taken in the past by Pnueli et al. [6] in the context of 
Equality Logic (Boolean combinations of equalities). This article can be seen as 
a natural continuation of that work. 

As an example of the reduction in state-space that our method can achieve 
consider the ‘diamond’ graph shown below. For such a graph with n nodes and 
all edge weights equal to 1, the uniform range allocation results in a state-space 
of size 0(n"). In contrast, our approach allocates a single constant to one node, 
and 2 values to all the rest, which results in a state-space of size 0(2”). If the 
graph has arbitrary edge weights, the state-space resulting from a uniform range 
allocation can grow up to 0((n+maxC)^), while in our approach it would remain 
0(2”) (it will, however, increase the values themselves, which has a relatively 
minor effect on performance). 




2 Problem Formulation 

Let Vars(ip) denote the set of variables used in a Separation formula ip over the 
set of integers Z. A domain (or range) R{(p) of a formula is a function from 
Vars{ip) to 2^. Let Vars{(p) = {t>i, . . . Vn} and |i?(r’i)| be equal to the number of 
elements in the set R(vi). The size of domain R{<p), denoted by |i?(v?)| is given 

by |i?(v?)| = |.R('yi)| • l^('t' 2 )| |.R(i’n)|- Now, let SATji{(p) denote that ip is 

satisfiable in a domain R. Our goal is the following: 

Find a small domain R such that 

SATniip) ^ SAT^iip) ( 1 ) 

We say that a domain R is adequate for ip if it satisfies formula (1). It is straight- 
forward to see that if SAT^^ip) holds then the minimal adequate domain has size 
1: simply assign the variables in Varsipp) a constant value according to one of 
the satisfying assignment. Thus, finding the smallest domain for a given formula 
is at least as hard as checking the satisfiability of p. So, rather than examining p 
we investigate the set of all Separation formulas with the same set of predicates 
as p, denoted by ^{p). Thus, our new, less ambitious goal is the following: 

Given a Separation formula p, find the smallest domain R which is ad- 
equate for ^{p). 

The problem of finding the smallest adequate domain for d>{p) is still too compu- 
tationally expensive (it is still exponential, although we will not prove it here). 
We will therefore concentrate on finding over-approximations which are easier 
to compute. As was previously indicated, our solution is based on a graph-based 
analysis of the formula’s structure. 
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Input Formula. We define a normal form for Separation logic as follows: 1 ) 
‘greater-equal than’ (>) is the only allowed predicate, and 2) there are no nega- 
tions in the formula. Every Separation logic formula can be transformed to this 
form by first translating it to NNF, reversing negated inequalities, reversing 
‘less than’ (<, <) predicates and finally, replacing strong inequalities of the form 
X > y + c with weak inequalities of the form x> y + c + 1 . 



A Graph Represeutatiou. Given a Separation Logic formula (/?, we construct 
the inequalities graph Gip{V,E) as follows (we will write Gip from now on). 

Defiuitiou 1 (Inequalities graph). The inequalities gra,plh.G^p{V,E) is con- 
strueted as follows: Add a node to V for eaeh v € Vars{ip). For each predicate 
of the form x > y + c in (p, add an edge to E from the node representing x to 
the node representing y, with a weight c. 

We will say that a domain is adequate for a graph G^p if it is adequate for 

Defiuitiou 2 (Cousisteut subgraphs). A subgraph of an inequalities graph 
Gp is consistent if it does not include a cycle with positive accumulated weight 
(^positive cycle for short). 

Intuitively, a consistent subgraph represents a set of edges (predicates) that can 
be satisfied simultaneously. Restating in graph-theoretic terms, our goal is: 

Given G^,, find a domain R for each node in V such that every consistent 
subgraph of G^, can be satisfied from values in R. 

Example 1 . The set of constants associated with each vertex in the following 
graph constitute an adequate domain. Each edge is assumed to have weight 1 . 
There exists a solution from these sets to every consistent subset of edges in the 
graph. 



{6,10} X (5,9) 
hxi X2 

1 1 

X_A X3] l 

{7} 1 {4,8} 

For example, the set of edges {{xi,X2), {X2,X3), (0:3, 14)} can be satisfied by 
the assignment xi = 10 , X2 = 9,2:3 = 8,2:4 = 7 . Note that there is no need to 
satisfy the subset containing all the edges because it is inconsistent. 

It can be shown that for every simple cycle there exists an adequate domain 
with size In our example the size of the domain is 8. If we try to reduce 

this size, for example by setting R{xi) = {6}, the domain becomes inadequate, 
because the subset (2:1, 2:2), (2:2, 2:3), (2:3, 2:4) is unsatisfiable under R. □ 
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It is obvious that we should find small ranges with an overhead smaller than 
what is saved compared to the other methods mentioned earlier. In this respect 
we were not entirely successful, as in the worst case our algorithm is exponential 
(it requires at some point to look for all negative cycles in the graph). In practice, 
however, we did not encounter an example that takes a long time to allocate 
ranges for. It is left for future research to find polynomial approximations to our 
algorithm. 



3 The Range Allocation Algorithm 

As mentioned in the introduction, it is known that for every inequalities graph 
(with all edge weights equal to 1), the range 1 . . . |y| is adequate, resulting in 
a state-space of size However Example 1 shows that analysis of the 

graph structure may yield smaller state-spaces. For example, if a graph has 
two unconnected subgraphs then they can be allocated values separately, hence 
reducing the overall state-space. In fact, it is possible to analyze every Strongly 
Connected Component (SCC) separately. The edges between the SCCs can then 
be satisfied by appropriately shifting the allocated domains of each SCC. Thus, 
a solution to this problem for SCCs implies directly a solution for the general 
problem. 

The pseudo-code in Figure 1 shows the overall structure of our algorithm. 
The main procedure. All ocate-Graph receives a graph as input and returns a 
graph where all its nodes are annotated with adequate ranges. For each node x 
we denote the corresponding range by R{x). It allocates values for SCCs with the 
procedure Al I o cat e-SCC a,nd then shifts them so that these ranges are valid in 
relation to the other SCCs present in the graph. Given an SCC S, A llocate-SCC 
calls itself recursively on a smaller SCC derived from S (the recursion stops 
when S is the trivial SCC comprised of a single node) . When returning from the 
recursive call, the values assigned to the smaller SCC are used to allocate values 
for all the nodes in SCC S. The description of Allocate-SCC, which requires 
several definitions, appears in the next subsection. 

3.1 Range Allocation for SCCs 

The goal of Allocate-SCC is to annotate each node a; of a given SCC S with a 
finite and adequate range R{x) (not necessarily continuous). The primary source 
of difficulty in assigning adequate values to SCCs is the presence of cycles. We 
introduce the notion of outpoints to deal with cycles in the SCC. 

Definition 3 (Cutpoint-set). Given a directed graph G{V,E), a set of nodes 
V C V is a Cutpoint-set if removing them and their adjacent nodes makes the 
graph cycle free. 

The notion of outpoints is also known in the literature as Feedback Vertex Sets 
[5]. Finding a minimal set of outpoints is an NP-Hard problem, so our imple- 
mentation uses a polynomial approximation (several such approximations are 
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Annotated-Graph Allocate-Graph(directed graph G) 

1. For each non-trivial SCC SgG, S= Allocate-SCC (S) 

2. Following the partial-order forest, allocate values to non-SCC 
nodes and shift the allocated values of the SCCs so they satisfy 
all predicates. //If every SCC is contracted to a single node, the 

// resulting graph is a forest 

3. Return the Annotated graph G. 

Annotated-Graph Allocate-SCC(SCC S) 

1. If S' has a single node x assign R(x) = {0} Euid return S. 

2. Find the set of outpoints C of S // See Definition 3 

3. Construct the Cutpoint-graph Sc II See Definition 4 

4. Allocate-SCC (Sc) ■ 

5. For each regular node x\ll See description of the four phases in Sec. 3.2 

(a) (Phase 1) Find the cycle-values of x 

(b) (Phase 2) Find the dfs-values of x 

(c) (Phase 3) Let {Cix,G 2 x, ■ ■ ■ ,Gnx} be the set of outpoints that CcUi 
reach x. Then assign 

7?(a:) = [J ({m+w|m € R[Gix)/\{v £ {cycle-values%.^{x)Udfs-values%.^{x)))}) 

Cix 

(d) (Phase 4) Add a dfs-value corresponding to the virtual level 

6. Return the annotated SCC S. 



Fig. 1. Procedures Alio cat e-Graph calls Allocate-SCC for each SCC. Both procedures 
receive a directed graph as input, and annotates each node x in this graph with a set 
of adequate values R{x). 



described in the above reference) . We will refer to non-cutpoint nodes simply as 
regular nodes. This distinction refers to the current recursion level only. As we 
will soon see, all graphs at recursion levels other than the highest one comprise 
only of the outpoints of the original SCC S. 

Next, we define a collapsing of an SCC S on to its Cutpoint-set, which we 
call the Cutpoint-graph of S: 

Definition 4 (Cutpoint-graph). Given an SCC S and a Cutpoint-set C of 
S, a outpoint graph of S with respect to C zs a directed graph Sc{C,E) such 
that for u,v € C, u ^ v, edge {u, v) € E with weight w if and only if there is 
a path in S from u to v not passing through any other vertex in C and with an 
accumulated weight w 

Note that according to this definition there are no self loops in Cutpoint-graphs. 
As an example, consider the graph shown in the left of Figure 2. One possible 
Cutpoint-set for this graph is the set {A, C}. The Cutpoint-graph over these 
nodes is shown on the right hand side of the figure. We will use this graph as a 
running example to illustrate our algorithm. 
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L(2) 




S(2) -1 

Fig. 2. An see S (left) and its eutpoint-graph with respect to the eutpoint-set {A,e}. 



A llocate-SCC (Figure 1) progresses by calling itself recursively on the graph 
Sc- It is easy to see that Sc is also an SCC, but smaller. This ensures that 
Allocate -see terminates. This observation is proven in Lemma 1 below. 

Lemma 1. A Cutpoint- graph Sc of an SCC S is an SCC and has less vertices 
than S. 

(All proofs appear in the full version of this article [12]). 

In case S' is a single node Allocate-SCC assigns it the range {0} and returns. 
The set of values returned from the recursive call are then used to assign values to 
the rest of the graph (the regular nodes at this level). The process of assigning 
these values is involved, and is done in four phases as described in the next 
subsection. 

3.2 The Four Phases of Allocating Ranges to Regular Nodes 

Ranges are assigned to regular nodes in four phases. In the first two phases 
the assigned ranges, called cycle-values and dfs-values respectively, should be 
thought of as representative values: they do not necessarily belong to the final 
ranges assigned by the algorithm. In Phase 3, these values will be combined 
with the ranges assigned to the outpoints to compute the final ranges for all the 
regular nodes. In the description of these phases we will use the notion of tight 
assignments, defined as follows: 

Definition 5 (Tight assignments). Civen a directed weighted path from node 
X to node y with an assignment of a single value to each node, this assignment 
is called tight with respect to the value assigned to x, if and only if all the 
inequalities represented by the path are satisfied and the value at each node other 
than X is the largest possible. In that case, the path is said to be tight with respect 
to the value at x. 

Phase 1. The role of Phase 1 is to find values that satisfy all non-positive cycles 
in the graph, assuming that the cutpoints in the cycles are assigned the value 0 
(this assumption will be removed in Phase 3). 

We find all the non-positive cycles, each of which, by definition, has one or 
more cutpoints. For every path p on that cycle from cutpoint Ci to cutpoint Cj, 
we then assign 0 to Ci and tight values with respect to Cfs assignment, to the 
other nodes in p except Cj. Typically there is only one cutpoint in the cycle. 
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which can be thought of as a particular case of the above one where C 2 = C\. If 
in this step a node a; on p is assigned a value v, we say that this value is obtained 
from Ci- 

At the end of this phase regular nodes have one value for each non-positive 
cycle that goes through them, while outpoints have the single value 0. In Figure 2 
these values are shown in parenthesis. Note that Y has an empty set of values 
as there are no non-positive cycles going through it. 

The set of values associated with each node x is called the cycle-values cor- 
responding to level 0 and is denoted by cycle-values^ (x) (0 being the value 
assumed at outpoints). Further, the subset of cycle-values at x obtained from 
Ci is denoted by cycle-valuesQfx). 



Phase 2. The role of Phase 2 is to assign values that satisfy all acyclic paths 
in the SCC starting from a outpoint, assuming that the outpoint has a value 0 
(this assumption will be removed in Phase 3). In the second phase, we begin by 
creating a new set of values at each node called dfs-values. Regular nodes have 
one dfs-value corresponding to each outpoint that can reach it directly (that 
is, without going through any other outpoint). For a node x and a outpoint Ci 
that can reach it directly (i.e. not through other outpoints), denote the dfs-value 
corresponding to Ci at level 0 by df s-values^.{x). The value of df s-values^\x) 
is calculated as follows. Let n be the number of direct paths from Ci to x, and let 
vi, ... ,Vn be values corresponding to tight assignments to x with respect to Ci. 
Then dfs-values^ fx) = min{z;i . . . Vn}. The implementation of Phase 2 involves 
a simple Depth-First Search (DFS) which starts from outpoints and backtracks 
on reaching any outpoint. 

Referring to Figure 2, df s-values\{Y) = —1 and dfs-values q{Y) = —100. 
Thus, the dfs-values for Y are [-1,-100]. The other dfs-values are shown in 
Figure 3 in square brackets. The first value is obtained from A and the other, if 
present, from C. 




Fig. 3. The dfs-values of nodes appear in square brackets. Both outpoints {A and C) 
have a path to Y, hence the two dfs-values. The final allocated ranges for all nodes are 
shown in curly brackets. 



Phase 3. The role of Phase 3 is to translate the representative values com- 
puted in the first two phases into actual ranges using the ranges allocated to the 
outpoints by the recursive call. In the third phase, we use ranges assigned to 
outpoints by the recursive call Allocate-SCC (Sc) as the base for shifting the 
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representative values that were computed by the first two phases. The ranges 
assigned to the outpoints remain unchanged (none of the three phases modify 
the ranges assigned to the outpoints by deeper recursive calls) . 

For each regular node x we find all the outpoints {Cix, C 2 x, Cnx} that can 
reach it not through other outpoints. Then R{x) is given by 

R{x) = ({u + G R{Cix) A (u G {cycle-valuesc.^ (x) U df s-valuesQ.^ ( 2 ;)))}) 

Cix 

Example 2. Consider once again the graph in Figure 2. Assume that cutpoints 
have already been assigned the following values: R{A) = {—10, —1} and R{C) = 
{0}. For node F, the set cycle-values'^ (Y) is empty and dfs-values'^{Y) = 
{-1,-100}. The cutpoints that can reach Y are A and C. So the range as- 
sociated with Y is the union of {—100} (the values from C) and {—11, —2} (the 
values from A). 

The value u-\-v ^ R{x) such that u G R{Cix) and u is a dfs-value or a cycle-value 
is said to correspond to level u at Cix- 

Phase 4. We now add one more value, called the virtual dfs-value, to each 
regular node. The need to have an additional value is explained as follows. Given 
a satisfiable subgraph S' C G, the values assigned in the previous phases can be 
used for a node x only if it can be reached from a outpoint. For the case where 
X is not reachable from any outpoint in S' we need to have an extra value. 

We allocate virtual dfs-values by starting from the highest value at each 
outpoint, and going backward along the edges until we reach another outpoint. 
We assign tight values along each reverse path with respect to the starting 
outpoint. At the end, for each node x we pick the maximum value among all the 
values assigned by different paths and make it the virtual dfs-value. 

From now on, by dfs-values of a node x we will mean all the dfs-values 
corresponding to all levels and cutpoints. We use the term cycle-values in a 
similar way. Considering the graph in Figure 2, the final set of values assigned to 
all nodes is shown in Figure 4. Underlined numbers are those that were allocated 
in Phase 4. 



A{-10,-1 




Y{-100,-11,^} 

j X{-101,-12,1,^} 

L{-100, -11, 2.^} 



S{-8, 1,^} R{-9,0,r3} 



C(0) M{-99,-10,3,U} 




A{-10,Ai} lO' 



Fig. 4. The final allocated ranges for all nodes are shown in curly brackets. The right 
figure shows the ranges allocated for the Cutpoint-Graph, which are preserved in higher 
recursion levels, as can be seen on the left graph. Underlined numbers are allocated in 
Phase 4. 
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4 Correctness of the Algorithm 

Assuming that Allocate-SCC \s correct, the correctness of Allocate-Graph is 
easy to see: Allocate-Graph simply shifts the ranges allocated to SCCs and 
assigns a single value to nodes between them so that all the inequality con- 
straints between the SCCs are satisfied. From now on we will focus on proving 
the correctness of Allocate-SCC {the proofs of most of the lemmas in this sec- 
tion appear in the full version of this article [12], and are omitted here due to 
lack of space). We will prove that the procedure terminates and that the ranges 
it allocates are adequate. 

Termination of Allocate-SCC is guaranteed by Lemma 1 because it implies 
that the number of nodes decreases as we go from S to Sc- This ensures that 
the size of the considered SCC decreases in successive recursive calls until it is 
called with a single node and returns. 

We now have to show that Allocate-SCC alloca,tes adequate ranges. Assume 
that S is an SCC and A llocate-SCCuses a set of outpoints C in allocating ranges 
to S. Given a satisfiable subgraph S' C S, we describe an assignment procedure 
that assigns values to its nodes from the ranges allocated to it by Allocate-SCC, 
which satisfy all of the predicates represented by S' . The assignment procedure 
and the proof of correctness are explained using an augmented graph of S' , de- 
noted by S'^^g . Essentially the augmented graph reflects the tightest constraints 
in the original graph between nodes in S' . Clearly if we satisfy tighter constraints 
than we can satisfy the original set of constraints in S' . 

Building the Augmented Graph. We construct the augmented graph as 
follows. Find all paths P in S' starting from some cutpoint Ci and ending at a 
regular node x such that: 

— The path P does not occur as part of any non-positive cycle in S 

— If nodes in P are given corresponding dfs-valuesQ. values then it is not tight 

with respect to value 0 at cutpoint Ci- 

For each such path we add an edge from Ci to x with a weight equal to 
— {df s-valuesQ.{x)) (the negation of the level 0 dfs-value of x corresponding 
to Ci). Note that such an edge need not be a part of the graph C. By the fol- 
lowing lemma, if S' is satisflable then the augmented graph S'j,^^g is satisflable 
as well. 

Lemma 2. If S' is a satisfiable subgraph of S then the graph S'^^g as con- 
structed above is satisfiable as well. Further, any assignment that satisfies S'^^g 
satisfies S'. 

Example 3. Suppose we are given a graph S' as shown in Figure 5 (refer only 
to the solid edges). S' is a satisflable subgraph of the graph shown in Figure 2. 
The dfs-value of X and Y corresponding to level 0 and cutpoint A are -1 and -2 
respectively (shown in square brackets). In S' , the path A ^ Y ^ A is not tight 
with respect to these values. We therefore augment S' by adding two edges: one 
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A[o](-i). 




S(-2) R(-3) 



X[-2](-3) 



' o 

Y[-l](-2) 



-1 

C(0) 



L(-2; 



M(-l) 




Fig. 5. The graph S' (solid edges), and its augmentation (adding the dotted edges) 
S'Aug for the first and second recursion levels. The augmented graph can be assigned 
tight dfs-values. The values assigned by the assignment procedure (see Section 4.1) are 
shown in parenthesis. 



Input: Augmented satisfiable subgraph S'^ug 

Output: A satisfying assignment to S'^ug from the ranges allocated by 

Allocate-SCC. 

1. For each regular node x find all outpoints {Cix ■ ■ - Cmx} in S'^-ag that 
can reach it directly, not through any other outpoint. 

2. For each outpoint Cix find all direct paths P to x , and for each 
such path find tight values assuming the value of outpoint Cix is us 
assigned by the previous recursive call . 

3. Find the minimum among all these tight values corresponding to all 
outpoints in {Cix---Cmx} und assign it to x. 

4. For nodes that cannot be reached from any point in C' assign them 
their virtual dfs-values (see end of Section 3.2). 



Fig. 6. The assignment procedure DFS-Assign, which demonstrates how to assign val- 
ues to a given satisfiable subgraph S' from the ranges allocated by Allocate-SCC. 



from A to Y with weight 1 and one from A to X with weight 2, both depicted as 
dotted edges in the graph. These edges reflect the tightest constraints between 
A, Y and X. 

4.1 The Assignment Procedure 

The assignment procedure assigns values to S'jyug that satisfy its predicates. By 
Lemma 2 these values satisfy S' as well. The assignment procedure is recursive: 
we first handle the graph ^ug}^ cutpoint graph of where C is a set 
of outpoints of S'^.^g and C C C. The base case of the recursion corresponds to 
a graph with a single node to which we assign the value 0. Assume that we have 
already assigned values, recursively, to nodes in C . Now we assign values to all 
the regular nodes in by starting at outpoints (with the values assigned to 

them by the deeper recursion calls) and doing a DFS-like search. This procedure 
is shown below in Figure 6. 

Referring to Figure 5, the values assigned by DFS-Assign are shown in paren- 
thesis beside each node. At the end of this procedure each node in S' is assigned 
a single value such that all predicates in S' are satisfied (yet to be proved) . 

Then we have the following lemma: 
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Lemma 3. Assuming that the assignment procedure assigns satisfying values to 
a outpoint graph from the appropriate ranges, it does so for as 

well from the ranges allocated by Allocate-SCC. 

4.2 The Ranges Allocated by Allocate-SCC Kve Adequate 

Our goal is to prove that A 1 1 ocate-SCC assigns adequate ranges to the nodes of 
any given SCO S. We will use the following lemma 

Lemma 4. Assuming Allocate-SCC assigns adequate ranges to Sc, it assigns 
adequate ranges to all nodes of S. 

This lemma follows directly from Lemma 3. Now we prove the main theorem. 
Theorem 1. Allocate-SCC assigns adequate ranges to nodes of any given 

sees 

Proof. The termination of Allocate-SCC follows from Lemma 1. We prove the 
correctness of Allocate-SCC hy induction on the number of outpoints present 
in the given SCC. The base case is an SCC with one node for which the theorem 
holds trivially. For the inductive step, it is shown in Lemma 4 that if we can 
assign adequate ranges to the Cutpoint-graph Sc then the four phases assign 
adequate ranges to all nodes of the SCC. Thus it follows that for any SCC S 
Allocate-SCC assigns adequate values to the nodes. □ 



5 Experimental Results 

We now present results of running Alio cat e-Graph on different benchmarks. 
The results are summarized in the table below. 



Example 


SMOD 


UCLID 


SEP 


Example 


SMOD 


UCLID 


SEP 


bfl2.ucl 


12 


101 


28 


code27s.smv 


85 


104 


94 


bflS.ucl 


15 


170 


48 


code32s.smv 


114 


109 


120 


bfld.ucl 


21 


158 


33 


code37s.smv 


57 


71 


90 


bfb.ucl 


105 


481 


127 


code38s.smv 


32 


34 


106 


bfl7.ucl 


176 


1714 


482 


code43s.smv 


361 


555 


424 


bflS.ucl 


280 


2102 


603 


code44s.smv 


69 


140 


84 


BurchDill 


250 


291 


336 


code46s.smv 


264 


287 


225 



The examples beginning with bf were derived from software verification prob- 
lems. The examples beginning with code have been used in [6]. 

We compare our approach against two other methods. The first method, 
named uclid, is the standard implementation from uclid [4], which is based on 
giving a full range of 1 ... n -I- maxC, as described in the Introduction {maxC 
being the sum of all constants) . The other method, named SEP was presented in 
[10] and is discussed in the Introduction as well. The table shows the number of 
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Boolean variables that are required to encode the ranges assigned by the three 
different algorithms (given a set of n values the number of Boolean variables 
required to encode them is logarithmic in n). 

As we can see from the results above, our method is clearly superior to the 
other two methods. We outperform uclid by nearly 10 times on large examples 
(bflT.ucl and bflS.ucl). Compared to SEP we outperform it on the big examples 
by a factor of nearly 3. 

On graphs which are densely connected and have small edge weights our 
algorithm does not do as well as uclid. For such graphs, the ideal ranges seem to 
be dependent on the number of variables and relatively independent of the edges. 
Our algorithm on the other hand is heavily dependent on the edge structure in 
determining the values and as the edges to nodes ratio increases, the number 
of values assigned by our algorithm tends to increase as well. Hence on dense 
graphs, our algorithm ends up assigning too many values. 

6 Conclusion and Future Work 

We have presented a technique for allocating small adequate ranges for variables 
in a Separation Logic formula based on analyzing the corresponding inequalities 
graph. The state-space spawned by these small ranges can be then explored 
by standard SAT or BDD solvers. Experimental results show that our decision 
procedure can lead to exponential reduction in the state-space to be explored. 
A number of optimizations for making the decision procedure faster and reduce 
the ranges further were not presented here due to lack of space. The tool that 
we have developed, SMOD, is available for research purposes from [1]. 

As future work, the most important question still needs to be answered is 
whether it is possible to find a polynomial algorithm for range allocation. Al- 
though we did not experience long run-times in all the experiments that we 
conducted, this can become a bottleneck in the presence of many non-positive 
cycles in the inequalities graph. 
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Abstract. There is a large variety of algorithms for ground decision procedures, 
hut their differences, in particular in terms of experimental performance, are not 
well studied. We compare the behavior of ground decision procedures hy com- 
paring the performance of a variety of technologies on benchmark suites with 
differing characteristics. Based on these experimental results, we discuss relative 
strengths and shortcomings of different systems. 



1 Introduction 

Decision procedures are an enabling technology for a growing number of applications 
of automated deduction in hardware and software verification, planning, bounded model 
checking, and other search problems. In solving these problems, decision procedures 
are used to decide the validity of propositional constraints such as 

z = f{x - y) A X = z + y ^ -y = -(x - f(f(z))) . 

This formula is in the combination of linear arithmetic and an uninterpreted function 
symbol /. Since all variables are considered to be universally quantified, they can be 
treated as (Skolem) constants, and hence we say that the formula is ground. 

Solving propositional formulas with thousands of variables and hundreds of thou- 
sands of literals, as required by many applications, is a challenging problem. Over the 
last couple of years, many different algorithms, heuristics, and tools have been devel- 
oped in addressing this challenge. Most of these approaches either use extensions of 
binary decision diagrams [1], or an equisatisfiahle reduction of propositional constraint 
formulas to propositional logic, which is solved using a Davis-Putnam [2] search pro- 
cedure. With the notable exception of Simplify [3] there is a lack of systematic bench- 
marking for most implemented systems. Moreover, their differences, especially in terms 
of experimental performance, are not well studied. 

Ultimately, it is our goal to develop comprehensive maps of the behavior of ground 
decision procedures by comparing the performance of a variety of technologies on 

* Funded by SRI International, by NSF Grants CCR-0082560, EIA-0224465, and CCR- 
0326540, DARPA/AFRL-WPAFB Contract F33615-01-C-1908, and NASA Contract 
B0906005. 

R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 162-174, 2004. 
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benchmark suites with differing characteristics, and the study in this paper is an es- 
sential first step towards this goal of comprehensive benchmarking. We have been col- 
lecting a number of existing benchmarks for ground decision procedures and applied 
them to CVC and CVC Lite [4], ICS [5], UCLID [6], MathSAT [7], Simplify [3], and 
SVC [8] systems. This required developing translators from the various benchmark for- 
mats to the input languages of the systems under consideration. For the diversity of the 
algorithms underlying ground decision procedures, we only measure runtimes. We an- 
alyze the computational characteristics of each system on each of the benchmarks, and 
expose specific sfrengfhs and weaknesses of fhe sysfems under consideration. 

Since such an endeavor should ultimately develop into a cooperative ongo- 
ing activity across the field, our main emphasis in conducting fhese experimenfs 
is nol only on reproducibility but also on reusability and extensibility of our ex- 
perimental setup. In particular, all benchmarks and all translators we developed 
for converting input formats, and all experimental results are publicly available at 
http : //www. csl . sri . com/users/demoura/gdp-benchmarks .html. 

This paper is structured as follows. In Section 2 we include a brief overview of the 
landscape of different methods for ground decision procedures, and in Section 3 we 
describe the decision procedures, benchmarks, and our experimental setup. Section 4 
contains our experimental results by pairwise comparing the behavior of decision pro- 
cedures on each benchmark set, and in Section 5 and 6 we summarize some general 
observations from these experiments and provide final conclusions. 



2 Ground Decision Procedures 

We consider ground decision procedures (GDP) for deciding propositional satisfiabil- 
ify for formulas with literals drawn from a given constraint theory T. GDPs are usually 
obtained as extensions of decision procedures for the satisfiability problem of conjunc- 
tions of constraints in T. For example, satisfiability for a conjunction of equations and 
disequations over uninterpreted terms (U) is decidable in 0(nlog(n)) using congru- 
ence closure [9]. Conjunctions of constraints in rational linear arithmetic are solvable 
in polynomial time, although many algorithms such as Simplex have exponential worst- 
case behavior. In the theory T> of difference logic, arithmetic constraints are restricted 
to constraints of the form x — y < c with c a constant. An 0{n^) algorithm for the 
conjunction of such constraints is obtained by searching, using the Bellman-Ford algo- 
rithm, for negative-weight cycles in the graph with variables as nodes and an edge of 
weight c from xtoy for each such constraints. For individual theories T i with decidable 
satisfiability problems, the union of all Tj’s is often decided using a Nelson-Oppen [10] 
or a Shostak-like [11,12] combination algorithm. 

Given a procedure for deciding satisfiability of conjunctions of constraints in T it is 
straightforward to decide propositional combinations of constraints in T by transform- 
ing the formula into disjunctive normal form, but this is often prohibitively expensive. 
Better alternatives are to extend binary decision diagrams to include constraints instead 
of variables (e.g. difference decision diagrams), or to reduce the propositional constraint 
problem to a purely propositional problem by encoding the semantics of constraints in 
terms of added propositional constraints (see, for example. Theorem 1 in [13]). Algo- 
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Table 1. Command line options used to execute GDPs. 



ICS 


ulimit -s 30000; ics prohlem-name . Ics 


UCLID 


uclid problem-name .ucl sat 0 zchaff 


cvc 


cvc -i-sat < problem-name . cvc 


CVC Lite 


cvcl -i-sat fast < problem-name .cvc 


SVC 


SVC problem -name. SVC 


Simplify 


Simplify problem-name . smp 


Math-SAT 


mathsat.linux problem-name .ms -bj math -heuristic SatzHeur 



rithms based on this latter approach are characterized by the eagerness or laziness with 
which constraints are added. 

In eager approaches to constructing a GDP from a decision procedure for T, propo- 
sitional constraints formulas are transformed into equisatisfiable propositional formu- 
las. In this way, Ackermann [14] obtains a GDP for the theory U by adding all possible 
instances of the congruence axiom and renaming uninterpreted subterms with fresh 
variables. In the worst case, the number of such axioms is proportional to the square 
of the length of the given formula. Other theories such as S-expressions or arrays can 
be encoded using the reductions given by Nelson and Oppen [10]. Variations of Acker- 
mann’s trick have been used, for example, by Shostak [15] for arithmetic reasoning in 
the presence of uninterpreted function symbols, and various reductions of the satisfiabil- 
ity problem of Boolean formulas over the theory of equality with uninterpreted function 
symbols to propositional SAT problems have recently been described [16], [17], [18]. 
In a similar vein, an eager reduction to propositional logic works for constraints in dif- 
ference logic [19]. 

In contrast, lazy approaches introduce part of the semantics of constraints on de- 
mand [13], [20], [7], [21]. Let (p be the formula whose satisfiability is being checked, 
and let L be an injective map from fresh propositional variables to the atomic subfor- 
mulas of <j) such that L~^[ip] is a propositional formula. We can use a propositional 
SAT solver to check that L~^[(p] is satisfiable, but the resulting truth assignment, say 
li A ... Ain, might be spurious, that is L[[i A . . . A [„] might not be ground-satisfiable. If 
that is the case, we can repeat the search with the added lemma clause (^[i V . . .V ~^ln) 
and invoke the SAT solver on (^[i V ... V ~^ln) A L~^[(j)]. This ensures that the next 
satisfying assignment returned is different from the previous assignment that was found 
to be ground-unsatisfiable. In such an offline integration, a SAT solver and a constraint 
solver can be used as black boxes. In contrast, in an online integration the search for sat- 
isfying assignments of the SAT solver is synchronized with constructing a correspond- 
ing logical context of the theory-specific constraint solver. In this way, inconsistencies 
detected by the constraint solver may trigger backtracking in the search for satisfying 
propositional assignments. An effective online integration can not be obtained with a 
black-box decision procedures but requires the constraint solver to process constraints 
incrementally and to be backtrackable in that not only the current logical context is 
maintained but also contexts corresponding to backtracking points in the search for 
satisfying assignments. The basic refinement loop in the lazy integration is usually ac- 
celerated by considering negations of minimal inconsistent sets of constraints or “good” 
over-approximations thereof. These so-called explanations are either obtained from an 
explicitly generated proof object or by tracking dependencies of facts generated during 
constraint solver runs. 
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(c)SAL suite 



(d)SEP suite 



Fig. 1. Benchmark suites with number of boolean (dark gray) and non-boolean (light gray) vari- 
ables in each problem. 



3 Experimental Setup 

We describe the setup of our experiments including the participating systems, the bench- 
marks and their main characteristics, and the organization of the experiments itself. This 
setup has been chosen before conducting any experiments. 

3.1 Systems 

The systems participating in this study implement a wide range of different satisfia- 
bility techniques as described in Section 2. All these GDPs are freely available and 
distributed^, and in each case we have been using the latest version (as of January 10, 
2004). We provide short descriptions in alphabetical order. 

The Cooperating Validity Checker. (CVC 1.0a, http : / /verify. Stanford, edu 
/CVC/) [4] is a GDP for the combination of theories including linear real arithmetic A, 
uninterpreted function symbols U, functional arrays, and inductive datatypes. Proposi- 
tional reasoning is obtained by means of a lazy, online integration of the zChaff SAT 
solver and a constraint solver based on a Shostak-like [11] combination, and explana- 
tions are obtained as axioms of proof trees. We also consider the successor system CVC 
Lite (version 1.0.1, http ; / / chicory . Stanford . edu/ CVCL/), whose architec- 
ture is similar to the one of CVC. 

The Integrated Canonizer and Solver. (ICS 2.0, http : / / ics . csl . sri . com) [5] 
is a GDP for the combination of theories including linear real arithmetic A, uninter- 
preted function symbols U, functional arrays, S-expressions, products and coproducts, 
and bitvectors. It realizes a lazy, online integration of a non-clausal SAT solver with 
an incremental, backtrackable constraint engine based on a Shostak [11] combination. 
Explanations are generated and maintained using a simple tracking mechanism. 

http : / /www . qpq . org/ 



1 
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UCLID. (version 1.0, http : / /www-2 . cs . emu . edu/~uclid/) is a GDP for the 
combination of difference logic and uninterpreted function symbols. It uses an eager 
transformation to SAT problems, which are solved using zChaff [6]. The use of other 
theories such as lambda expressions and arrays is restricted in order to eliminate them 
in preprocessing steps. 

Math-SAT. [7] (http : / / dit . unitn . it/~rseba/Mathsat . html) is a GDP 
for linear arithmetic based on a black-box constraint solver, which is used to detect 
inconsistencies in constraints corresponding to partial Boolean assignments in an of- 
fline manner. The constraint engine uses a Bellman-Ford algorithm for difference logic 
constraints and a Simplex algorithm for more general constraints. 

Simplify. [3] (http : //research . Compaq . com/SRC/esc/Simplify . html) 
is a GDP for linear arithmetic A, uninterpreted functions U, and arrays based on the 
Nelson-Oppen combination. Simplify uses SAT solving techniques, which do not incor- 
porate many efficiency improvements found in modern SAT solvers. However, Simplify 
goes beyond the other systems considered here in that it includes heuristic extensions 
for quantifier reasoning, but this feature is not tested here. 

Stanford Validity Checker. (SVC 1.1, http : //chicory . Stanford . edu/SVC/) 
[8] decides propositional formulas with uninterpreted function symbols U, rational lin- 
ear arithmetic A, arrays, and bitvectors. The combination of constraints is decided using 
a Shostak-style combination extended with binary decision diagrams. 

3.2 Benchmark Suites 

We have included in this study freely distributed benchmark suites for GDPs with con- 
straints in A and U and combinations thereof. These problems range from standard 
timed automata examples to equivalence checking for microprocessors and the study 
of fault-tolerant algorithms. For the time being we do not separate satisfiable and un- 
satisfiable instances. Since some of the benchmarks are distributed in clausal and other 
in non-clausal form, it is difficult to provide measures on the difficulty of these prob- 
lems, but Figure 1 contains the number of variables for each benchmark problem. The 
dark gray bars represents the number of boolean and the light gray bars the number of 
non-boolean variables. 

The Math-SAT Benchmark Suite. 

(http : / /dit . unitn . it/~rseba/Mathsat . html) is composed of timed au- 
tomata verification problems. The problems are distributed only in clausal normal form 
and arithmetic is restricted to coefficients in { — 1,0,1}. This benchmark format also 
includes extra-logical search hints for the Math-SAT system. This suite comprises 280 
problems, 159 of which are in the difference logic fragment. The size of the ASCII rep- 
resentation of these problems ranges from 4Kb to 26Mb. As can be seen in Figure 1(a) 
most of the variables are boolean^. 

^ We suspect that many boolean variables have been introduced through conversion to CNF. 
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Table 2. Classification. 



Benchmark suite 


Sat 


Unsat 


Unsolved 


Math-Sat 


37 


224 


19 


UCLID 


0 


36 


2 


SAL 


21 


167 


29 


SEP 


9 


8 


0 



Table 3. Number of timeouts and aborts for each system. 



1 


Ksa 


lygjgm 




cve Lite 




fcHBiaHM 


Math-SAT 


Math-SAT 




mm 


3 


mm 


19 




91 


52 


suite 




mm 


21 


58 


61 


mm 


0 


0 


UCLID 




4 


2 


0 


9 




24 


n/a 


suite 




4 


0 


14 


9 


■la 


0 


n/a 


SAL 




1 


36 


1 


115 


mg 


99 


n/a 


suite 




mrJjm 


4 




7 


4 


4 


n/a 


SEP 




KS 


0 


0 


1 


1 


0 


n/a 


suite 




■a 


1 


1 


1 


0 


0 


n/a 



The UCLID Benchmark Suite, (http : //www-2 . cs . emu . edu/~uclid) is de- 
rived from processor and cache coherence protocol verifications. This suite is dis- 
tributed in the SVC input format. In particular, propositional structures are non-clausal 
and constraints include uninterpreted functions U and difference constraints T>. Alto- 
gether there are 38 problems, the size of the ASCII representation ranges from 4Kb to 
450Kb, and the majority of the literals are non-boolean (Figure 1(b)). 

The SAL Benchmark Suite, (http://www.csl.sri.com/users/demoura/ 
gdp-benchmarks .html) is derived from bounded model checking of timed au- 
tomata and linear hybrid systems, and from test-case generation for embedded con- 
trollers. The problems are represented in non-clausal form, and constraints are in full 
linear arithmetic. This suite contains 217 problems, 1 10 of which are in the difference 
logic fragment, the size of the ASCII representation of these problems ranges from 1Kb 
to 300Kb. Most of the boolean variables are used to encode control flow (Figure 1(c)). 

The SEP Benchmark Suite. (http://iew3.technion.ac.il/~ofers/ 
smtlib- local/index . html) is derived from symbolic simulation of hardware 
designs, timed automata systems, and scheduling problems. The problems are repre- 
sented in non-clausal form, and constraints in difference logic. This suite includes only 
17 problems, the size of the ASCII representation of these problems ranges from 1.5Kb 
to 450Kb. 

We developed various translators from the input format used in each benchmark 
suite to the input format accepted by each GDP described on the previous section, but 
Math-SAT. We did not implement a translator to the Math-SAT format because it only 
accepts formulas in the CNF format, and a translation to CNF would destroy the struc- 
tural information contained in the original formulas. In addition, no specifics are pro- 
vided for generating hints for Math-SAT. In developing these translators, we have been 
careful to preserve the structural information, and we used all the information available 
to use available language features useful for the underlying search mechanisms^. The 

^ We have also been contemplating a malicious approach for producing worst-possible transla- 
tions, but decided against it in such an early stage of benchmarking. 
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UCUD 



(d) (e) (f) 

Fig. 2. Runtimes (in seconds) on Math-SAT benchmarks. 



Math-SAT search hints, however, are ignored in the translation, since this extra-logical 
information is meaningless for all the other tools. 



3.3 Setup 

All experiments were performed on machines with IGHz Pentium III processor 256Kb 
of cache and 512Mb of RAM, running Red Hat Linux 7.3. Although these machines 
are now outdated, we were able to set aside 4 machines with identical conhguration for 
our experiments, thereby avoiding error-prone adjustment of performance numbers. We 
considered an instance of a GDP to timeout if it took more than 3600 secs. Each GDP 
was constrained to 450Mb of RAM for the data area, and 40Mb of RAM for the stack 
area. We say a GDP aborts when it runs out of memory or crashes for other reasons. 
With these timing and memory constraints running all benchmarks suites requires more 
than 30 CPU days. 

For the diversity of the input languages and the algorithms underlying the GDPs, we 
do not include any machine-independent and implementation-independent measures. 
Instead, we restrict ourselves to reporting the user time of each process as reported by 
Unix. In this way, we are measuring only the systems as implemented, and observa- 
tions from these experiments about the underlying algorithms can only be made rather 
indirectly. 

With the notable exception of CVC Lite, all GDPs under consideration were ob- 
tained in binary format from their respective web sites. The CVC Lite binary was ob- 
tained using g++ 3.2.1 and configured in optimized mode. Table 1 contains the com- 
mand line options used to execute each GDP“*. 

In [20] the depth first search heuristic (option -dfs) is reported to produce the best overall 
results for CVC, but we did not use it because this flag causes CVC to produce many incorrect 
results. 
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Fig. 3. Runtimes (in seconds) on the Math-SAT benchmarks (cont.). 



4 Experimental Results 

Table 2 shows the number of satisfiable and unsatisfiable problems^ for each bench- 
mark, and a problem has been classified “Unsolved” when none of the GDPs could 
solve it within the time and memory requirements. The scatter graphs in 

- Figures 2 and 3 include the results of running CVC, ICS, UCLID, MathSAT, and 
SVC on the Math-SAT benchmarks, 

- Figure 4 contains the runtimes of CVC, ICS, SVC, UCLID on the UCLID bench- 
marks, and 

- Figure 5 reports on our results of running CVC, ICS, SVC, and UCLID on the SAL 
benchmarks 

using the experimental setup as described in Section 3. Points above (below) the diago- 
nal correspond to examples where the system on the x (y) axis is faster than the other; 
points one division above are an order of magnitude faster; a scatter that is shallower 
(steeper) than the diagonal indicates the performance of the system on the x (y) axis de- 
teriorates relative to the other as problem size increases; points on the right (top) edge 
indicate the x (y) axis system timed out or aborted. Multiplicities of dots for timeouts 
and aborts are resolved in Table 3. 

For lack of space, the plots for Simplify and CVC Lite are not included here. 
Simplify performed poorly in all benchmarks except SEP, and does not seem to be 
competitive with newer GDP implementations. In the case of CVC Lite, its predeces- 
sor system CVC demonstrated over-all superior behavior. Also, the SEP suite did not 

^ For validity checkers, satisfiable and unsatisfiable should be read as invalid and valid instances 
respectively. 
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Fig. 4. Runtimes (in seconds) on the UCLID benchmarks. 



prove to distinguish well between various systems, as all but one problem could be 
solved by all systems in a fraction of a second. All these omitted plots are available at 
http : //www. csl . sri . com/users/demoura/gdp-benchmarks .html. 

Figures 2 and 3 compare ICS, UCLID, CVC, SVC and Math-SAT on the Math- 
SAT suite. The plots comparing UCLID contain only the problems in the difference 
logic fragment supported by UCLID. The results show that the overall performance of 
ICS is better than those of UCLID, CVC and SVC on most problems of this suite. 

With the exception of Math-SAT (which was applied only to its own benchmark 
set), every other system failed on at least several problems - mainly due to exhaustion 
of the memory limit. Also, the Math-SAT problems proved to be a non-trivial test on 
parsers as SVC’s parser crashed on several bigger problems (see Table 1). The perfor- 
mance of SVC is affected by the size of the problems and the CNF format used (Fig- 
ure 3(d)) on this suite, since its search heuristics are heavily dependent on the proposi- 
tional structure of the formula. On the other hand, the performance of the non-clausal 
SAT solver of ICS is not quite as heavily impacted by this special format. In fact, ICS 
is the only GDP which solves all problems solved by Math-SAT (Figure 2(d)), and it 
also solved several problems not solved by Math-SAT, even though search hints were 
used by Math-SAT. UCLID performs better than CVC and SVC on most of the bigger 
problems (Figures 2(e) and 2(f)). 

Figure 4 compares ICS, UCLID, CVC, and SVC on the UCLID benchmarks. What 
is surprising here is that SVC is competitive with UCLID on UCLID’s own benchmarks 
(Figure 4(e)). Also, the overall performance of SVC is superior to its predecessor CVC 
system (Figure 4(f)). ICS does not perform particularly well on this benchmark set in 
that it exhausts the given memory limit for many of the larger examples. 

Figure 5 compares ICS, UCLID, CVC, and SVC on the SAL benchmarks. ICS 
performs better than UCLID on all examples (Figure 5(a)), and its overall performance 
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is better than CVC (Figure 5(b)). ICS, UCLID and CVC fail on several problems due 
lack of memory. Although the overall performance of CVC seems better than UCLID, 
the latter managed to solve several problems where CVC failed (Figure 5(d)). 



5 Observations and Research Questions 

As has been mentioned in Section 3, the results produced in our experiments measure 
mainly implementations of GDPs. Nevertheless, we try to formulate some more gen- 
eral observations about algorithmic strengths and weaknesses from these experiments, 
which should be helpful in studying and improving GDPs. 

Insufficient Constraint Propagation in Lazy Integrations. The eager UCLID system 
usually outperforms lazy systems such as ICS, CVC, and Math-SAT on problems which 
require extensive constraint propagation. This seems to be due to the fact that the un- 
derlying lazy integration algorithms only propagate inconsistencies detected by the con- 
straint solver, but they do not propagate constraints implied hy the current logical con- 
text. Suppose a hypothetical problem which contains the atoms {x = y,y = z,x = z}, 
and during the search the atoms x = y and y = z are assigned to true, then the atom 
X = z could be assigned to true by constraint propagation (transitivity), but none of 
the existing lazy provers propagates this inference. In contrast, these kinds of propa- 
gations are performed in eager integrations, since the unit-clause rule of Davis-Putnam 
like SAT solvers assumes the joh of propagating constraints. 

Arithmetical Constraints in the Eager Approach. On arithmetic-intensive problems, 
UCLID usually performs worse than other GDPs with dedicated arithmetic constraint 
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solvers. In particular, UCLID’s performance is heavily affected by the size of con- 
stants. One of the approaches used by UCLID to reduce difference logic to proposi- 
tional logic [19] constructs a graph where each node represents an integer variable, and 
a weighted edge represents a difference constraint. Starting from this graph, a clause is 
created for each negative cycle. We believe several irrelevant clauses are generated in 
this process. We say a clause is irrelevant when it cannot be falsified during the search 
due to, for instance, the presence of other constraints. In several examples, UCLID con- 
sumed all memory in the translation to propositional logic. For instance, the problem 
abz 5 - 9 0 0 was easily solved by ICS and Simplify using less than 40Mb, but UCLID 
consumed all memory during the translation to propositional logic. 

Performance vs. Expressiveness. The version of UCLID we have been using is re- 
stricted to separation logic, but other systems with support for full arithmetic seem to 
be competitive even when problems are restricted to this domain. The goal should be 
a fully general system that reliably does the special cases (restricted theories, bounded 
instances) at least as well as specialized systems. 

Blind Search Problem in Lazy Solvers. The problems in the UCLID suite are mainly 
composed of non-boolean variables (Figure 1), so almost all atoms are constraints, and 
the lazy constraint solvers introduce a fresh boolean variable for each distinct constraint. 
This process produces an underconstrained formula which contains several proposi- 
tional variables which occurs only once. Thus, the begin of the search is completely 
chaotic, and arbitrary, since from the point of view of the SAT solving engine any as- 
signment will satisfy the apparently easy and underconstrained formula. We say the 
SAT engine starts an almost blind search, where the successful search heuristics devel- 
oped by the SAT community are hardly useful. Our hypothesis is corroborated by the 
Math-SAT and SAL suites, where several boolean variables are used to encode the con- 
trol flow and finite domains. In this case, although the SAT engine does not know the 
hidden relationships between the freshly added boolean variables, it is guided by the 
control flow exposed by fhe boolean variables. This hypothesis also helps explaining 
why SVC performs better than ICS and CVC on the UCLID suite, simply because SVC 
uses specialized heuristics based on the structure of the formula. 

Memory Usage. Math-SAT and Simplify are the systems using the least amount of 
memory. In contrast, CVC often aborts by running out of memory instead of running 
up its time limits. A similar phenomenon can be observed with ICS. We have traced 
this deficiency back to imprecise generation of explanations for pruning Boolean as- 
signments. For instance, union-find sfrucfures are commonly fo represenf conjunctions 
of equalities, but a naive logging mechanism of dependencies might already produce 
extraneous explanations [22]. Better tradeoffs between the accuracy of the generated 
explanations and the cost for computing them are needed. Another problem in systems 
such as ICS and CVC is the maintenance of information to perform backtracking. Math- 
SAT is a non-backtrackable system, so it does not suffer from this problem, at the cost 
of having to restart from scratch every time an inconsistency is detected. 

Loss of Structural Information. When developing the translators for these experi- 
ments, we noticed that the performance of most solvers heavily depends on the way 
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problems are encoded. For instance, the repetitive application of the transformation 
F[t] F[x] A X = t with t a term occurring in F, and x a fresh variable, transforms 
many easy problems into very hard problems for UCLID, CVC and CVC Lite. 



6 Conclusions 

Our performance study demonstrates that recently developed translation-based GDPs 
- eager or lazy - well advance the state-of-the-art. However, this study also exposes 
some specific weaknesses for each of the GDP tools under consideration. Handling 
of arithmetic needs to be improved for eager systems, whereas lazy systems can be 
considerably improved by a tighter integration of constraint propagation, specialized 
search heuristics, and the generation of more precise explanations. A main impediment 
for future improvements in the field of GDPs, however, is not necessarily a lack of new 
algorithms in the field, but rather the limited availability of meaningful benchmarks. 
Ideally, these benchmarks are distributed in a form close to the problem description 
(e.g. not necessarily in clausal normal form). A collaborative effort across the field of 
GDPs is needed here [23]. 
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Abstract. The logic of equality with uninterpreted functions (EUF) 
and its extensions have been widely applied to processor verification, 
by means of a large variety of progressively more sophisticated {lazy or 
eager) translations into propositional SAT. Here we propose a new ap- 
proach, namely a general DPLL(A) engine, whose parameter X can be 
instantiated with a specialized solver Solver^ for a given theory T, thus 
producing a system DPLL(T). We describe this DPLL(T) scheme, the 
interface between DPLL(A) and Solver^, the architecture of DPLL(A), 
and our solver for EUF, which includes incremental and backtrackable 
congruence closure algorithms for dealing with the built-in equality and 
the integer successor and predecessor symbols. Experiments with a first 
implementation indicate that our technique already outperforms the pre- 
vious methods on most benchmarks, and scales up very well. 



1 Introduction 

The logic of equality with uninterpreted functions (EUF) [BD94] and its exten- 
sions has been widely used for processor verification (see, e.g., [BD94,BGV011 
[BLS02b,VB03]). 

For deciding validity - or, dually, unsatisfiability - of formulas in this kind of 
logics, during the last five years many successively more sophisticated techniques 
have been developed, most of which can be classified as being eager or lazy. In 
the eager approaches the input formula is translated, in a single satisfiability- 
preserving step, into a propositional GNF, which is checked by a SAT solver 
for satisfiability. The lazy approaches [AGG00,dMR02,ABG+02,BDS02,FJOS03] 
instead abstract each atom of the input formula by a distinct propositional 
variable, use a SAT solver to find a propositional model of the formula, and 
then check that model against the theory. Models that are incompatible with 
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** Work partially supported by the Spanish CICYT project Maverish ref. TIC2001- 
2476 and by a FPU grant, ref. AP2002-3533, from the Spanish Secretarfa de Estado 
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the theory are discarded from later consideration by adding a proper lemma to 
the original formula. This process is repeated until a model compatible with the 
theory is found or all possible propositional models have been explored. Also 
less lazy variants exist (e.g., in CVC, [BDS02]), in which (partial) propositional 
models are checked incrementally against the theory while they are built by the 
SAT solver; however, from partial models that are consistent with the theory no 
information is derived, contrary to what is proposed here. The main advantage of 
such lazy approaches is their flexibility, since they can relatively easily combine 
new decision procedures for different logics with existing SAT solvers. 

For the logic of EUF the eager approaches are in general faster than the lazy 
ones, probably because the theory information is used to prune the search, rather 
than to validate it a posteriori. Among the eager approaches for EUF, two differ- 
ent encodings into propositional logic are at present predominant. The first one, 
known as the EIJ (or per- constraint) encoding, abstracts each equality atom 
ti=tj by a propositional variable and takes care of the equality by imposing 
additional transitivity constraints [BV02,SSB02]. A drawback of this technique 
is that when the number of transitivity constraints is too large, the exponential 
blowup in the formula may make it too hard for the SAT-solver. The second 
one, known as the small domain (SD) encoding [PRSS99,BLS02b], is essentially 
an 0{n log n)-size encoding that gets transitivity for free, at the expense of a 
certain loss of structure, by translating equalities into formulas instead of into 
single propositional variables. In order to get the best of both encodings, two 
different hybrid approaches were presented in [BLS02a] and [SLB03], based on 
an analysis of the input problem that estimates properties such as the number 
of transitivity constraints. 

In this paper we introduce the first (to our knowledge) technique that is 
not based on such lazy or eager translations into SAT. Building on independent 
previous work by some of the authors [Tin02,N003], we propose a new approach 
based on a general engine for propositional solving, DPLL(A), parametrized by 
a solver for a theory of interest. A system DPLL(T) for deciding the satisfiability 
of CNF formulas in a theory T is produced by instantiating the parameter X 
with a module Solver^, that can handle conjunctions of literals in T . For instance, 
in the case of the pure EUF logic, T is just the theory of equality. 

The basic idea is similar to the CLP{X) scheme for constraint logic program- 
ming: provide a clean and modular, but at the same time efficient, integration of 
specialized theory solvers within a general purpose engine, in our case one based 
on the Davis-Putnam-Logemann-Loveland procedure [DP60,DLL62]. In [Tin02] 
a DPLL(r) scheme was already given in a more high-level abstract form, as a 
sequent-style calculus. Although no detailed interface was defined there between 
the DPLL(A) engine and the theory solver, several optimization strategies were 
already discussed. The framework given here can be seen as a concrete realiza- 
tion of the calculus in [Tin02], except that, contrary to [Tin02], we do not expect 
Solver.^ to give always complete answers. Relaxing that requirement does not af- 
fect completeness, but turns out to be crucial for efficiency, at least in the EUF 
case. A DPLL(A) scheme was introduced and informally described in [NO03]. 
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There, however, emphasis was placed on the development of new congruence 
closure algorithms to be used in the theory solver for the EUF logic. 

The concrete DPLL(T) scheme and its architecture and implementation pre- 
sented here combine the advantages of the eager and lazy approaches. On the 
one hand, experiments reveal that, as soon as the theory predicates start playing 
a significant role in the formula, our initial implementation of DPLL(T) already 
outperforms all other approaches. On the other hand, our approach is similar in 
flexibility to the lazy approaches: more general logics can be dealt with by simply 
plugging in other solvers into our general DPLL(X) engine, provided that these 
solvers conform to a minimal interface, described later. 

This paper is structured as follows. In Section 2 we describe in detail the 
DPLL(T) scheme and the kind of logics it can be applied to, discussing the 
advantages and disadvantages of the small interface. The architecture of our 
current DPLL(X) implementation is given in Section 3. Section 4 describes our 
solver for EUF, which includes incremental congruence closure algorithms that 
deal with the built-in equality and the integer successor and predecessor symbols, 
as well as with backtracking. Finally, Section 5 gives experimental results of our 
preliminary implementation of DPLL(T) for EUF, and Section 6 concludes and 
outlines a number of promising research directions for future work, in particular 
for instantiating the DPLL(X) engine with solvers for other theories. 

2 Prom DPLL to DPLL(T) 

In this section we describe the main characteristics of the DPLL(T) scheme. Any 
DPLL(T) system consists of two parts: the global DPLL(A) module and a solver 
Solver^ for the given theory T. The DPLL(A) part is a general DPLL engine 
that is independent of any particular theory T. Here we will use as examples 
three possible instances for the theory part T : the ones for propositional logic, 
pure EUF, and EUF with successors and predecessors. 

2.1 The Logics under Consideration 

In this paper we will consider the satisfiability problem of formulas in CNF, that 
is, of a given set S (a conjunction) of clauses. By a clause we mean a disjunction 
of literals, each literal I being of the form A or ^A, where A is an atom drawn 
from a set A. What A is depends on the theory T under consideration. 

An interpretation / is a function I: A ^ {0)1}- We write / 1= Hf the literal I 
is true in /, that is, if / is a positive atom A with I (A) = 1 or / is a negated atom 
^A with I{A) = 0. For each theory T, we will consider only T -interpretations, 
that is, interpretations that agree with the axioms of T, in a sense that will be 
made precise below for the theories considered here. An interpretation (resp. T- 
interpretation) / is a model (resp. T -model) of a clause set S if in each clause of 
S there is at least one literal I that is true in I. The aim of this work is to design 
algorithms for deciding whether a clause set S has a T-model, and exhibiting 
such a model whenever it exists. For all logics under consideration in this paper, 
this problem is easily shown to be NP-complete. 
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Propositional Logic. The atoms are just propositional symbols of a set V, 
and the interpretations under consideration are unrestricted, i.e, any truth 
assignment /: P ^ {0, 1} is admitted: the theory T is empty in this case. 

Pure EUF. An atom is either of the form P{ti, . . . ,tn) where P is an n-ary 
symbol of a set of fixed-arity predicate symbols P, or an equation of the form 
s=t, where s, t and ti, ... ,tn are terms built over a set of fixed-arity function 
symbols. All constants (0-ary symbols) are terms, and /(si, . . . , s„) is a term 
whenever / is a non-constant n-ary symbol and Si, . . . , are terms^. In the 
following, lowercase (possibly indexed) s, t, and u always denote terms, and 
literals ^s=t are usually written as s^t. In pure EUF the theory T expresses 
that ‘=’ is a congruence, i.e., it is reflexive (R), symmetric (S), transitive (T), 
and monotonic (M); hence the only admissible interpretations / are the ones 
satisfying the conditions below for all terms s, t, Si, and tf. 

R: I\= s=s 

S: I\=s=t if I\=t=s 

T : I\= s=t if / ^ s=u and I ^ it=t for some term u 

Ml: I\= f{.si . . . Sn)=f{ti . . .tn) if I|=Si=ti for alH in l..n 

M2: /^P(si...s„) if J 1= P(ti .. .t„) and for all i in l..n 

EUF with Successors aud Predecessors. This subset of the CLU logic 
from [BLS02b] is the extension of EUF with two distinguished unary func- 
tion symbols, Succ and Pred. Besides the congruence axioms for ‘=’, the 
interpretations / must also satisfy the following for all terms t: 

I \= Succ{Pred{t))=t I \= Pred{Succ{t))=t Vn>0, I \= Succ^ (t) t 

where Succ^{t) denotes the term Succ{. . . Succ(t) . . .) headed with n Succ 
symbols applied to t, hence the last axiom scheme denotes an infinite set of 
axioms. Note that I\=Pred'^{t)^t is a consequence of the above. 

2.2 DPLL(X): The General DPLL Part of DPLL(T) 

The DPLL(A) part of a DPLL(T) system is the one that does not depend on the 
concrete theory T. It can be like any DPLL procedure, with basically all its usual 
features such as heuristics for selecting the next decision literal, unit propagation 
procedures, conflict analysis and clause learning, or its policy for doing restarts. 
Like in the propositional case, it always considers atoms as purely syntactic 
objects. The only substantial difference with a standard propositional DPLL 
procedure is that the DPLL(X) engine relies on a theory solver for T, denoted 
here by Solver^, , for managing all information about the current interpretation I. 

^ Here we do not consider any if-then-else constructs, since they do not increase the 
expressive power of the logics and are eliminated, in linear time, in a structure- 
preserving preprocessing phase, which moreover preserves conjunctive normal forms. 
Each occurrence of a (sub)term if-then-else{F, s,t) is replaced by a new constant 
symbol v, and {-^F V v = s) A {F y v = t) is added to the formula. Boolean 
if-then-else(A, B, C) constructs are simply considered as (“lA y B) A (A V C). 
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Another difference is that it does not use certain optimization for SAT solvers, 
such as the pure literal rule, which as already pointed out in [BDS02,Tin02], 
among others, is in general not sound in the presence of a theory T. 



2.3 The Solver^ Part of DPLL(T) 

Solver^ knows what the real atoms A are, and knows about the theory T under 
consideration. It maintains a partial interpretation /: one that is defined only 
for some literals of the set C of all literals occurring in the problem input to the 
DPLL(T) prover. In the following, the literals of C are called £-literals. Inside 
Solver^, I is seen as a stack of literals, which is possible because / can equiva- 
lently be considered as the set of all £-literals that are true in I. In the following, 
this stack will be called the /-stack. We say that a literal I is a T -consequence 
of I, denoted I\=tI, if I is true in all total T-interpretations extending I. It is 
assumed that for every £-literal I the solver is able to decide whether I \=t I 
or not. Essentially, Solver j, is an abstract data type with the following five sim- 
ple operations, which are all is needed for the interface between DPLL(A) and 
Solver ^ : 



Initialize (T : Literal set). This procedure initializes Solver^ with C. It 
also initializes the /-stack to the empty stack. 

SetTrueCZ: T-literal) : T-literal set. This function raises an “inconsis- 
tency” exception if / \=t A with the current /-stack. Otherwise it pushes 
I onto the /-stack, and returns a set of £-literals that have become a T- 
consequence of / only after extending / with 1. 

For example, in the EUF case, if / ^ a=b and / |= d=c, one of the literals 
returned by SetTrue(d=6) can be f{a,a)=f{b,c), if this is an £-literal. 

For efficiency reasons, the returned set can be incomplete; for example, in 
EUF, with SetTrue(/(o) yf /(/)) it may be expensive to detect and return 
all consequences a' yf b' where 1 1= a=a' and / \= b=b' (see also Section 4) . 

IsTrue?(0 T-literal) : Boolean. This function returns true if I[=tI, and 
false otherwise. Note that the latter can be either because I \=t A, or 
because neither I\=tI nor I\=tA. 

Backtrack (n: Natural) . This procedure pops n literals from the /-stack. Note 
that n is expected to be no bigger than the size of the /-stack. 

Explanation^ : T-literal) : T-literal set. This function returns a sub- 
set J of / such that J \=t I, for a given literal I such that I was returned 
as a T-consequence of a SetTrue(?') operation and no backtracking popping 
off this I' has taken place since then. 

Intuitively, these conditions ensure that the “proof” of / \=t I that was found 
when V was asserted is still valid. Note that several such J may exist; for 
example, in EUF, both { ayf 6, b=c } and { ayf d, d=c } may be correct 
explanations for the literal a yf c. 
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This operation is used by DPLL(X) for conflict analysis and clause learning. 
In our implementation it is used speciflcally to build an implication graph 
similar to those built by modern SAT solvers. For this application, it is also 
required that all literals in J are in the /-stack at heights lower than or 
equal to the height of the literal I' (i.e., no other “later” proof is returned; 
see Section 3 for an example). 

We point out that the solver is independent from the characteristics of the 
DPLL(A) procedure that will use it. For example, it does not know about de- 
cision levels, unit propagation, or related features. In fact, a solver with this 
interface could be used as well in the context of non-DPLL systems, such as 
the lazy (or lemmas-on-demand) approaches, or even in resolution-based sys- 
tems. Note that something like the Explanation operation is needed as well to 
produce the lemmas in such lazy approaches, and in general, in any deduction 
system that has to produce proof objects. 

These five operations constitute in some sense a minimal set of operations for 
the exchange of information between the two main components of a DPLL(T) 
system. Having only these operations provides a high degree of modularity and 
independence between the global DPLL(A) engine and the theory solver Solver^. 

Our current DPLL(T) implementation has a DPLL(A) engine built in house 
to use the operations above. However, because of the simplicity and modularity 
of the solver interface, we believe that developers of state-of-the-art SAT solvers 
would need relatively little work to turn their solvers into DPLL(A) engines. 

Although our implementation already performs very well for EUF, a tighter 
interconnection by means of more operations might enhance its performance. 
This could be achieved for example by having more fine-tuned theory-specific 
heuristics for choosing the next decision literal (see Section 6). However, possible 
efficiency gains would come at the expense of less modularity, and hence require 
more implementation work, especially when developing solvers for new theories. 

3 Our Architecture for DPLL(X) 

The current modular design is the result of developments and experiments with 
our own DPLL(A) implementation (in C). The current system has a clear Chaff- 
like flavor, mirroring Chaff’s features as described in [MMZ+01,ZMMM01], with 
a 2-watched literal implementation for efficient unit propagation, and the VSIDS 
heuristic for selecting the next decision literal, in combination with restarts and 
the lUIP learning scheme. Most of Chaff’s features lift from DPLL to DPLL(A) 
without major modifications, so here we only point out some differences. 

One difference arises in unit clause detection. In the propositional case, when 
a literal I becomes true, is the only literal that becomes false before unit 
propagation is applied to the clause set, so new unit clauses can only come from 
clauses containing In DPLL(T), on the other hand, additional literals can be 
set to false as a consequence of the assertions made to the theory solver. This 
possibility considerably increases the extent of unit propagations in DPLL(T) 
and, correspondingly reduces the size of the search space for the DPLL(X) en- 
gine. For example, in the EUF case, if ayf cV P is in the clause set and I \=t a=b, 
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a^c — > P 



a^c — > P 



/(a)//(6) — □ a/6 



□ 



c = 6 — > 

(a) Correct Explanation creates no cycles 



c = & — > 

(b) Wrong Explanation may 
create cycles 



Fig. 1. Creation of undesirable cycles in the implication graph. 



then setting a literal such as b=c to true will make a c false and so produce 
P as & new unit consequence. We point out that this fact will be discovered by 
the DPLL(X) engine only if the set returned by the call SetTrue(5=c) does 
in fact contain a ^ c which, depending on the theory solver implementation, 
might or might not be the case. In any case, failing to return a ^ c does not 
affect the completeness of the overall DPLL(T) system, only the extent of unit 
propagation. 

Another source of incomplete unit clause detection is that two different lit- 
erals may in fact be equivalent in the current state of the interpretation: again 
in the EUF case, if in Solver^ we have I \=t a=b, and there is a two-literal clause 
a=c\J b=c, then the unit consequence a=c (or, equivalently, b=c) will not be 
detected. 

Concerning conflict analysis, the construction of the implication graph is 
more complex in DPLL(T) than in Chaff. In the latter, a node labeled with a 
literal I with antecedents nodes li, ...,?„ is always due to a clause -^hV. . 
on which unit propagation has taken place. Hence, to build the implication graph 
it suffices to have, together with each such I, a pointer to its associated clause. 
In DPLL(T), an implication graph can contain, in addition to nodes like the 
above, also nodes I that are a T-consequence of their antecedents li,. . . ,ln- Such 
nodes I are those returned by SetTrue calls to Solver^, and can be recognized as 
such since they have no pointer to an associated clause. Their antecedents can 
be obtained from the solver itself by calling Explanation(Z). For example, a run 
of the DPLL(A) algorithm on the clauses: 

/(a) yf /(&) V e a=b\J a^c (Jt=by c^b a=c\J P c^b\J ^P 



can be as follows: 



1 . 

2 . 

3. 

4. 

5. 



SetTrue(/(a)yf /(&)). 

SetTrue(ay^c). 

SetTrue(c=&). 

SetTrue(P). 

SetTrue(^P). 



Decision. SetTrue returns ay^&, a T-consequence. 
Unit propagation of ayf& on a=6V ay^c. 

Unit propagation of a yf & on a=b V c^b. 

Unit propagation of ay^c on a=c V P. 

Unit propagation of c^b on c^bV ^P. Conflict! 



The implication graph is built backwards from P and ^P. When ay^ & is reached, 
Explanation(ayf 6) is called, and indeed one possible explanation is f{a)^f{b). 
This process results in the graph depicted in Figure 1(a). Note that {ay^c, c=6} 
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also has 07^6 as a T-consequence, but using that set would lead to the graph in 
Figure 1(b), which has cycles. To avoid cycles, it is enough for Explanation(Z) 
to return explanations that are “older” than I, as precised in the definition of 
Explanation. 

4 A Solver for EUF 

A key ingredient for a solver for EUF is an algorithm for congruence closure. Now, 
the 0{n log n) DST congruence closure algorithm given in [DST80] (see also 
[NO80]) needs a relatively expensive initial transformation to directed acyclic 
graphs of outdegree 2. In [NO03] we proposed to replace this transformation by 
another one, at the formula representation level: we Currify, like in the implemen- 
tation of functional languages; as a result, there will be only one binary “apply” 
function symbol (denoted here by a dot “•”) and constants. For example, Cur- 
rifying f{a,g{b),b) gives •(•(•(/, a), •((/,&)), 6). Furthermore, like in the abstract 
congruence closure approaches (cf. [Kap97,BT00]), we introduce new constant 
symbols c for giving names to non-constant subterms t; such t are then replaced 
everywhere by c, and the equation t = c is added. Then, in combination with 
Currification, one can obtain the same efficiency as in more sophisticated DAG 
implementations by appropriately indexing the new constants like c, which play 
the role of the pointers to the (shared) subterms like t in the DAG approaches. 
For example, we flatten the equation •(•(•(/, a), •((?, 6)), &)=6 by replacing it by 
the four equations •(/, a)=c, •{g,b)=d, •{c,d)=e, and •{e,b)=b. 

These two (structure-preserving) transformations are done in linear time once 
and for all on the input formula given to our DPLL(T) procedure. As a con- 
sequence, since all compound terms (atoms) occuring in the EUF formula have 
gotten a “name”, i.e., they are equal to some constant, all equality atoms in the 
EUF formula are in fact equations between constants. Furthermore, the congru- 
ence closure algorithm of Solver^, only needs to infer consequences from a fixed, 
static set of equations E of the form ■{a,b)=c, where a, b and c are constants 
(note that the • symbol does not occur outside E). This makes our (DST-like) 
algorithm surprisingly simple and clean, and hence easier to extend. Our im- 
plementation of [NO03] also runs in 0(n log n) time, but of course it can be 
much faster than algorithms for arbitrary terms^. Once the closure is computed, 
deciding whether two constants a and b belong to the same class, i.e., all positive 
IsTrue?(a=6) operations, can be done in constant time. 

The data structures for congruence closure are: 

1. Pending unions: a list of pairs of constants yet to be merged. 

2. The Representative table: an array indexed by constants, containing for each 
constant its current representative (this can also be seen as a union-find data 
structure with eager path eompression) . 

3. The Class lists: for each representative, the list of all constants in its class. 

^ In fact, it is about 50 times faster than earlier implementations such as [TVOl] on 
the benchmarks of [TVOl]. 
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4. The Lookup table: for each input term -(a, b), a call Lookup{Representative{a) , 
Representative{b)) returns in constant time a constant c such that -(0,6) is 
equivalent to c, and returns _L if there is no such c. 

5. The Use lists: for each representative a, the list of input equations ■{b,c)=d 
such that a is the representative of 6 or c (or of both) . 

Each iteration of the algorithm roughly amounts to picking a pending union, 
and then using the lookup table and the use lists for efficiently detecting new 
pairs of constants to be merged. We refer to [NO03] for a detailed description and 
analysis of the algorithm, as well as for its extension to successor and predecessor 
symbols, which is also 0{n log n). 

We now briefly describe the other data structures and algorithms for the 
Solver^ operations, following a notation where unprimed constants a, 6, c are 
always representatives of primed ones a' ,b' ,c'; symbols d and e can be represen- 
tatives or not. 

The Literal Lists and Their Use for SetTrue. Upon its initialization. Solver^ 
builds, once and for all, for each constant d, a positive literal list containing all 
positive DPLL(X) literals of the form d = e, i.e., all positive literals contain- 
ing d that have been communicated to Solver^ in the Initialize operation. 
Analogously, there is also a negative literal list for each d. 

If a positive SetTrue, and its subsequent congruence closure, produces a 
union such that a class with former representative a is now represented by a 
different b, then, for each a' in the class list of a, the positive literal list of a' is 
traversed and all a'=b' in this list are returned as T-consequences. Also the nega- 
tive literal list of all such a' is traversed, returning those a'yf c' such that ayfc is 
stored in Diseq, a hash table containing all currently true disequalities between 
representatives; analogously also the negative literal list of all b' is traversed. 

After a SetTrue operation of a negative equation with representative form 
a^b, the negative literal list of all a' is traversed (or equivalently, the one of all 
b', if this is expected to be less work), returning all a'^b'. 

Backtracking. This is dealt with by means of a mixed policy: unions between 
constants are stacked in order to be undone. The Diseq hash table and the Lookup 
data structure (another hash table) are not restored under backtracking but 
instead have time stamps (the /-stack height combined with a global counter). 

Explanations. A newly derived positive equality is a T-consequence of a set of 
positive SetTrue operations and of the equations inside the congruence closure 
module. Negative equations are also T-consequences of such positive equalities, 
but in addition they are always caused as well by a single negative SetTrue. 

Hence, for retrieving the explanations for DPLL(A) literals, we need to be 
able to proceed “backwards” in our data structures until the set of initial SetTrue 
operations is reached. At each congruence propagation deriving d=e as a conse- 
quence of -{a',b')=d and -{a" ,b")=e, (pointers to) these two equations are kept, 
allowing one to continue backwards with the explanations of a'=a" and b'=b" . In 
addition, there is a method for efficiently extracting from a (slightly extended) 
union-find data structure the list of unions explaining a certain equality. 
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Finally, for explaining a negative equality c?i yf ei, there is an additional table 
for retrieving the single negative SetTrue(d 2 yf 62 ) operation that caused it; the 
remaining part of the explanation can then be found backwards, as the union of 
the explanations of di=d 2 and 61 = 62 . 

5 Experimental Results 

Experiments have been done with all 54 available benchmarks (generated from 
verification problems) that can be handled in EUF with successors and predeces- 
sors. Most of them were provided by the UCLID group at CMU^. All benchmarks 
are in SVC format [BDL96] (see [BLS02a] for more details on them). 

The table below contains (in its second column) the translation times for the 
four eager approaches described in Section 1: SD and EIJ [VB03], and the two 
hybrid methods Hybridl [BLS02a] and Hybrid2 [SLB03]. All translations were 
done using the state-of-the-art translator provided within the UCLID 1.0 tool. 
Furthermore, for two major SAT solvers, zChaff [MMZ+Ol] (version 2003.12.04) 
and BerkMin [GN02] (its recent version 561) the running times on the trans- 
lated formulas are given. For a fair comparison with our system, the times for 
the zChaff and BerkMin runs include the translation times as well, since those 
translations are not mere formats conversions but the result of sophisticated al- 
gorithms for reducing the size and the search space of the propositional formula 
produced. The choice of zChaff and BerkMin is motivated by the fact that our 
current DPLL(A) engine is modeled after zChaff and that BerkMin is presently 
considered one of the best SAT solvers overall. The first table has an extra 
column with the running times of SVC (version 1.1) as well. 

Results are in seconds and are aggregated per family of benchmarks, with 
times greater than 100s rounded to whole numbers^. All experiments were run 
on a 2GHz 512MB Pentium-IV under Linux, with the same settings for each 
benchmark except for the “Two queues” benchmarks where our system had 
learning turned off. Each benchmark was run for 6000 seconds. An annotation 
of the form (n t) or (n m) in a column indicates respectively that the system 
timed out or ran out of memory on n benchmarks. Each timeout or memory out 
is counted as 6000s. 

We point out that our current DPLL(A) implementation is far less tightly 
coded than zChaff, and is more than one order of magnitude slower than zChaff 
on propositional problems, even when the overhead due to the calls to Solver^ 
is eliminated. Considering that BerkMin consistently dominates zChaff on all 
the benchmark families, it is remarkable that our system performs better than 
the UCLID-|-BerkMin combination on the great majority of the families. In fact, 
there is no unique translation-based approach that outperforms DPLL(T) on 
more than two benchmark families. Furthermore, DPLL(T) is faster than SVC 
on all benchmarks, and so also faster than the lazy approach-based systems CVC 
[BDS02] and Verifun [FJOS03] which, as shown in [FJOS03], are outperformed 

® We are grateful to Shuvendu Lahiri and Sanjit Seshia for their kind assistance. 

^ Individual results for each benchmark can be found at www.lsi.upc.es/~oliveras, 
together with all the benchmarks and an executable of our system. 
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by SVC. We see this as strong initial evidence that DPLL(T) is qualitatively 
superior to existing approaches for deciding satisfiability modulo theories®. 



Benchmark family 


SD 


BerkMin 


Chaff 


DPLL(T) 


SVC 


Buggy Cache 


2.3 


2.4 


3.4 


6.7 


(It) 6000 


Code Validation Suite 


16.2 


44.9 


43.9 


3.7 


56.9 


DLX processor 


3.9 


10.2 


13.3 


1.2 


16.9 


Elf processor 


34.1 


5882 


(It) 6104 


575 


(It) 6078 


Out of order proc.(rf) 


27.1 


(2t) 18211 


(3t) 19213 


6385 


(2t) 12666 


Out of order proc.(tag) 


54.3 


247 


1457 


1979 


(4t) 28788 


Load-Store processor 


22.2 


51.4 


239 


30.3 


(3t) 18476 


Cache Coherence Prot. 


20.4 


4151 


(It) 9634 


3601 


(4t) 26112 


Two queues 


5.1 


407 


1148 


73.6 


1872 


Benchmark family 


EIJ 


BerkMin 


Chaff 


DPLL(T) 




Buggy Cache 


6.3 


6.4 


9.3 


6.7 




Code Validation Suite 


40.7 


41 


41.8 


3.7 




DLX processor 


13 


13.9 


14.4 


1.2 




Elf processor 


(2m) 20.1 


(2m) 12021 


(2m) 12021 


575 




Out of order proc.(rf) 


70.6 


(It) 7453 


(2t) 13926 


6385 




Out of order proc.(tag) 


210 


510 


837 


1979 




Load-Store processor 


(Im) 32 


(Im) 6034 


(Im) 6037 


30.3 




Cache Coherence Prot. 


(3m) 102 


(3m) 18257 


(3m) 18437 


3601 




Two queues 


(3m) 19.4 


(3m) 18028 


(3m) 18034 


73.6 




Benchmark family 


Hybrid 1 


BerkMin 


Chaff 


DPLL(T) 




Buggy Cache 


6.2 


6.4 


9.2 


6.7 




Code Validation Suite 


13.2 


13.5 


13.5 


3.7 




DLX processor 


13.1 


14.1 


15.2 


1.2 




Elf processor 


187 


941 


1646 


575 




Out of order proc.(rf) 


65.3 


(It) 7524 


(2t) 13009 


6385 




Out of order proc.(tag) 


175 


612 


799 


1979 




Load-Store processor 


64.1 


79.6 


88.4 


30.3 




Cache Coherence Prot. 


(3m) 102 


(3m) 18257 


(3m) 18438 


3601 




Two queues 


(3m) 19.5 


(3m) 18019 


(3m) 18028 


73.6 




Benchmark family 


Hybrid 2 


BerkMin 


Chaff 


DPLL(T) 




Buggy Cache 


3 


3.2 


3.7 


6.7 




Code Validation Suite 


27.7 


28 


28.6 


3.7 




DLX processor 


11.3 


12.9 


14.3 


1.2 




Elf processor 


47.2 


3182 


5467 


575 




Out of order proc.(rf) 


53 


(It) 10626 


(2t) 13913 


6385 




Out of order proc.(tag) 


140 


6918 


(2t) 12173 


1979 




Load-Store processor 


40.1 


45.47 


47.71 


30.3 




Cache Coherence Prot. 


37.3 


209 


690 


3601 




Two queues 


5.7 


793 


1832 


73.6 





® This evidence was confirmed recently by further experiments (also reported in de- 
tail at www.lsi.upc.es/~oliveras) showing that DPLL(T)’s perfomance dominates 
that of ICS 2.0 as well on the benchmarks listed here. 
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As expected, translation-based methods will normally outperform DPLL(T) 
for problems where the theory T plays a very small role. But this is no longer 
the case when theory predicates start playing a significant role, as in the fami- 
lies Code Validation, Elf and 000 processors (rf), and Two queues. This phe- 
nomenon becomes dramatic for instance in benchmarks from group theory®. 



6 Conclusions and Future Work 

We have presented a new approach for checking satisfiability in the EUF logic. 
This approach is based on a general framework and architecture in which a 
generic DPLL-style propositional solver, DPLL(A), is coupled with a special- 
ized solver Solver^ for a given theory T of interest. The architecture is highly 
modular, allowing any theory solver conforming to a simple, minimal interface 
to be plugged in into the DPLL(A) engine. The fundamental advantage with 
respect to previous approaches is that the theory solver is not only used to val- 
idate the choices made by the SAT engine, as done for instance in CVC, but 
also to propagate the entailed literals returned by SetTrue, using information 
from consistent partial models, considerably reducing the search space of the 
SAT engine. Initial results indicate that in the EUF case this leads to significant 
speed-ups in overall performance. 

More work needs to be done on our implementation. Aspects such as lemma 
management, decision heuristics and restarting policies are still immature. More 
accurate theory-dependent heuristics need to be explored. Also minimal-model- 
preserving optimizations should be worked out; for instance, the notion of P- 
terms [BGVOl] has its counterpart in our framework, and so could be used. 

Finally, other future work of course concerns the development of new theory 
solvers, or the conversion of existing ones (e.g., those used in CVC), into theory 
solvers conforming to our interface: solvers for EUF with associativity and tran- 
sitivity (AC) properties for certain symbols, EUFM (EUF -|- memories) [BD94], 
Separation Logic [SLB03], or the full CLU logic [BLS02b]. 
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Abstract. In this work we focus on model checking of probabilistic models. 
Probabilistic models are widely used to describe randomized protocols. A 
Markov chain induces a probability measure on sets of computations. The no- 
tion of correctness now becomes probabilistic. We solve here the general prob- 
lem of linear-time probabilistic model checking with respect to tu-regular speci- 
fications. As specification formalism, we use alternating Biichi infinite-word au- 
tomata, which have emerged recently as a generic specification formalism for 
developing model checking algorithms. Thus, the problem we solve is: given a 
Markov chain M and automaton A, check whether the probability induced by 
M of L{A) is one (or compute the probability precisely). We show that these 
problem can be solved within the same complexity bounds as model checking 
of Markov chains with respect to LTL formulas. Thus, the additional expressive 
power comes at no penalty. 



1 Introduction 

In model checking, we model a system as a transition system Ai and a specification as a 
temporal formula 'll;. Then, using formal methods, we check whether At satisfies ip [7]. 
One of the most significant developments in this area is the discovery of algorithmic 
methods for verifying temporal logic properties of finite-state systems [23, 18,6,33], 
This derives its significance both from the fact that many synchronization and commu- 
nication protocols can be modelled as finite-state programs, as well as from the great 
ease of use of fully algorithmic methods. Looking at model-checking algorithms more 
closely, we can classify these algorithms according to two criteria. The first criterion is 
the type of model that we use - nondeterministic or probabilistic. The second criterion 
is the specification language. 

For nondeterministic models and linear temporal logic (LTL), a close and fruitful 
connection with the theory of automata over infinite words has been developed [32-34]. 
The basic idea is to associate with each LTL formula a nondeterministic Biichi automa- 
ton over infinite words (NBW) that accepts exactly all the computations that satisfy 
the formula. This enables the reduction of various decision problems, such as satisfia- 
bility and model checking, to known automata-theoretic problems, yielding clean and 
asymptotically optimal algorithms. Furthermore, these reductions are very helpful for 

* Supported in part by NSF grants CCR-9988322, CCR-0124077, CCR-0311326, IIS-9908435, 
IIS-9978135, and EIA-0086264, by BSF grant 9800096, and by a grant from the Intel Corpo- 
ration. 

** For a full version with proofs, see www . cs . rice . edu/~vardi /papers. 
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implementing temporal-logic based verification algorithms, cf. [14], This connection to 
automata theory can also he extended to languages beyond LTL, such as ETL [34] and 
/xTL [30], 

In this paper we focus on model checking of prohabilistic models. Probabilistic 
models are widely used to describe randomized protocols, which are often used in dis- 
tributed protocols [5], communication protocols [8], robotics [27], and more. We use 
Markov chains as our probahilistic model, cf. [29] . A Markov chain induces a probabil- 
ity measure on sets of computations. The notion of correctness now becomes probabilis- 
tic: we say here that a program is correct if the probability that a computation satisfies 
the specification is one (we also discuss a quantitative notion of correctness, where we 
compute the probability that a computation satisfies the specification). Early approaches 
for probabilistic model checking of LTL formulas [19, 29] required determinization of 
NEW, which involves an additional exponential blow-up over the construction of au- 
tomata from formulas [24], requiring exponential space complexity, unlike the polyno- 
mial space complexity of standard model-checking algorithms for LTL, cf. [28]. 

The exponential gap for probabilistic model checking was bridged in [9, 10], who 
provided a polynomial-space algorithm, matching the lower bound of [26]. The algo- 
rithm in [9, 10] is specialized to LTL (and ETL) specifications, and proceeds by induc- 
tion on the structure of the formula. An automata-theoretic account of this algorithm 
was given in [11]. It is shown there that LTL formulas can be translated to a special 
type of NEW, which they call separated automata. (An NEW is separated if every 
two states that are located in the same strongly connected component have disjoint lan- 
guages). As with the standard translation of LTL formulas to NEW, the translation to 
separated automata is exponential. It is then shown in [1 1] how to model check Markov 
chains, in nondeterministic logarithmic space, with respect to separated NEW as com- 
plemented specification. This yields a polynomial space upper bound for probabilistic 
model checking of LTL formulas. 

The automata-theoretic framework in [11] is very specifically tailored to LTL. As 
mentioned earlier contrast, the automata-theoretic framework for model checking non- 
deterministic models is quite general and can also handle more expressive specification 
languages such as ETL and /iTL. This is not a mere theoretical issue. There has been a 
major recent emphasis on the development of industrial specification languages. These 
efforts resulted in a several languages [17,21,4,2], culminating in an industrial stan- 
dard, PSL 1.01 (www. accellera . com). Most of the new languages have the full 
power of NEW, i.e., they can express all w-regular languages. Thus, they are strictly 
more expressive than LTL [35], and, thus, not covered by the framework of [9-11]. 

In this paper we solve the general problem of probabilistic model checking with 
respect to w-regular specifications. As specification formalism, we use alternating Eiichi 
infinite-word automata (AEW); see discussion below. Thus, the problem we solve is: 
Given a Markov chain A1 and an AEW A, check whether the probability induced by A4 
of L{A) is one (i.e., whether Pm{L{A)) = 1). (A more refined problem is to calculate 
the probability precisely, see the full version.) 

The motivation for using AEWs as a specification formalism is derived from recent 
developments in the area of linear specification languages. Lirst, AEWs have been used 
as an intermediate formalism between LTL formulas and nondeterministic Eiichi word 
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automata (NBW). As shown in [12, 13], one can exploit the linear translation from LTL 
formulas to ABWs ([31]) for an early minimization, before the exponential translation 
to NBW. Second, not only can logics such as ETL and /iTL be easily translated to AB W, 
but also most of the new industrial languages can be translated to ABW. Furthermore, 
for some of them efficient such translations are known (cf. [1]). Thus, ABW can serve 
as a generic specification formalism for developing model checking algorithms. Note 
that applying the techniques of [9, 10] to ABW specifications requires converting them 
first to NBW at an exponential cost, which we succeed here in avoiding. 

We present here an algorithm for model checking of Markov chains, using ABWs 
as specifications. The space complexity of the algorithm is poly logarithmic in A4 and 
polynomial in A. The linear translation of LTL to ABW implies that this complexity 
matches the lower bound for this problem. 

As in [10, 11], our algorithm uses the subset construction to capture the language 
of every subset of states of A (an infinite word w is in the language of a set Q of 
states if ru G L(s) for every state s G Q and w ^ L{s) for every s ^ Q). While 
for LTL, a straightforward subset construction suffices, this is not the case for ABW. 
A key technical innovation of this paper is our use of two nondeterministic structures 
that correspond to the alternating automaton A to capture the language of every set 
of automaton states. The first nondeterministic structure is an NBW Af called the full 
automaton, and the second a “slim” version of the full automaton without accepting 
conditions, which we call the local transition system Ta- Every state q of A f and Ta 
corresponds to a set Q of states in A. While is possible, however, that a several states of 
A f correspond to the same set of states of A, every state of Ta corresponds to a unique 
set of states of A. The model-checking algorithm make use of the products G and Gf 
of the Markov chain M with Ta and Af, respectively. 

2 Preliminaries 

2.1 Automata 

Definition 1. A nondeterministic Biichi word automaton (NBW) is A= {S, S, Sq, S, F), 
where S is a finite alphabet, S is a finite set of states, 6 : S x S ^ 2^ is a transition 
function, Sq C S is a set of initial states, and F C S is a set of accepting states. 

Let w = Wq, wi, ... be an infinite word over E. For i G IN, let w* = Wi, Wi+i, . . . 
denote the suffix of w from its i’th letter. A sequence p = Sq, si, . . . in 5'“ is a run of A 
over an infinite word w G 27‘^,ifso G Sq and for every i > 0, we have Sj+i G 6{si,Wi). 
We use inf(p) to denote the set of states that appear infinitely often in p. A run p of A 
is accepting if inf(p) n F 0. An NBW A accepts a word m if .4 has an accepting 
run over w. We use L(A) to denote the set of words that are accepted by A. For s G S, 
we denote by .4^®^ the automaton A with a single initial state s. We write L{s) (the 
language of s) for L{A^^'^) when A is clear from the context. 

Before we define an alternating Biichi word automaton, we need the following def- 
inition. For a given set X, let be the set of positive Boolean formulas over X 

(i.e., Boolean formulas built from elements in X using A and V), where we also allow 
the formulas true and false. Let Y C X. We say that Y satisfies a formula 6 G B~^{X) 
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if the truth assignment that assigns true to the members of Y and assigns false to the 
members of 2f \ F satisfies 9. A tree is a set AT C ]N*, such that for x G N* and n G N, 
if xn G AT then x G X. We denote the length of a; by |a;|. 

An alternating Biichi word automaton (ABW) is ^ = (27, S, , S, F), where 27, 
S, and F are as in NBW, G S' is a single initial state, and S : S x E ^ 
is a transition function. A run of A on an infinite word w = Wq, wi, • ■ • is a (possibly 
infinite) S-labelled tree t such that r{e) = and the following holds: if |a;| = i, 
t(x) = s, and S{s,Wi) = 9, then x has k children x\, . . . ,Xk, for some k < 1^1. 
and {r(a;l), . . . , T{xk)} satisfies 9. The run r is accepting if every infinite branch in r 
includes infinitely many labels in F. Note that the run can also have finite branches; if 
|x| = i, t{x) = s, and h(s, ai) = true, then x need not have children. 

An alternating weak word automaton {AWW) is an ABW such that for every strongly 
connected component C of the automaton, either C CForCnF = 0. Given two 
AWW Ai and A 2 , we can construct AWW for 27‘^ \ L(Ai), L(Ai) H L(A 2 ), and 
L{Ai) U L{A 2 ), which are linear in their size, relative to A\ and A 2 [22]. 

Lemma 1. [16] Let A be an ABW. Then there exists an AWW Aw such that L{A) = 
L{Aw) and the size of Aw is quadratic in the size of A Furthermore, A^o can be 
constructed in time quadratic in the size of A. 



2.2 Markov Chains 

We model probabilistic systems by finite Markov chains. The basic intuition is that 
transitions between states are governed by some probability distribution. 

Definition 2. A Markov chain is a tuple A4 = (X, Ft, Pi) such that X is a set of 
states. Ft '. {X x X) — > [0, 1] is a transition probability distribution that assigns to 
every transition (xi,X 2 ) its probability. Ft satisfies that for every x\ G X we have 
'^X 2 &x ^t{xi,X 2 ) = 1. Pf : 2f — > [0,1] is an initial probability distribution that 
satisfies Ylx&x = 1- 

We denote by the Markov chain Xi with Pj that maps a; to 1. Sometimes we 

consider a Markov chain as a graph (AT, P) where (xi, 0 : 2 ) G E iff Pt{xi,X 2 ) > O.For 
an alphabet 27 = 2^^, let F : X ^ 27 be a labelling function, then each path p in X* 
or in X“ in Ai is mapped by F to a word w in 27* or in 27‘^ respectively. For simplicity 
we assume that 27 = X and that F(a;) = x for every x G X, this simplification does 
not change the complexity of verifying the Markov chain [10]. Note, that every infinite 
path of TVf is a word in X^ but the converse does not necessarily hold. The probability 
space on the set of pathes of Ai is defined as in [29]. 

The following property of Markov chains is called ergodicity and is proved in [15]. 
Let TVf be a Markov chain, then a path of Ai, with probability one, enters a bottom 
strongly connected component (BSCC) K of Ai, and contains every finite path in K 
infinitely often. In other words, let be the set of infinite words of Ai such that every 
word w in has a suffix that is contained in a BSCC K of Ai, and contains every 
finite path in K infinitely often. Then, PM{Le) = 1. 
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3 The Full Automaton and the Local Transition System 

In this section we capture the behavior of an AWW A using two nondeterminstic sys- 
tems. First we define the full automaton, which captures the languages of subsets of 
the states of A. Then we define the local transition system, which captures the local 
relations between subsets of states of A. 

3.1 The Full Automaton 

Given a AWW A = {S,S,S,F) (we ignore its initial state), we define its dual AWW 
A = {S, S, 6 , F), where the Boolean formula h(s, a) is obtained from ( 5 (s, a) by re- 
placing every true with false and vice versa, and every V with A and vice versa, in 
addition we define F = S' \ F. It is easy to see that A is an AWW. 

Lemma 2. [ 22 ] Let A be an AWW and A be its dual AWW For every state s we have 
thatL{A^^'>) = \F(i(^)). 

Given an AWW A and its dual AWW A we define the state space of the full automaton 
as a subset of 2 '^ x 2 ‘® x 2 '^ x 2 ‘®. We start with the following definition. 

Definition 3. A tuple (< 5 i, Q2, Qs, Q4) is consistent ifQ2 = S \ Qi, Qs C Qi \ F, 
and Q4 C Q2 \ F. 

Definition 4. Given an AWW A= {S, S, 5 , F) we define its full automaton as the NEW 
Af = (F, SfAf: Ff) where 

- Sf is the set of consistent tuples over 2 ^ x 2 ^ x 2 ^ x 2 ‘®. 

- A state (Q'i,Q' 2>Q3> Q4) '^/((Qi; Q2, Qs, Q4), cr) if Q[ h As 6 Qi( 5 (s, cr), 

Q2 1= AsgQ2<5(s, cr), and either: 

1 . Qs = Q4 = 0 , Q3 = Q'l \ F, andQ'i = Q'2\F 

2 . Qs 7^ 0 or Q4 0 , there exists I3 C Q'^ such that Y3 |= /\s^q^6{s, a) and 
Q3 = Y3 \ F, and there exists Y4 C Q'^ such that Y4 |= f\s^Qi 5 {s,a) and 
Q'4 = Y 4 \F 

- Ff = {{Qi,Q2, Q3, Q4) € S/IQ3 = Q4 = 0 )} 

Theorem 1. Let A be an AWW and let Af be its full automaton, let Q Q S be a set of 
states, then for every state {Qi, Q2, Q3, Q4) such that Qi = Q we have that 

Pi P L(A(")) = 

We now present more properties of the full automaton. We use these properties later. 

Definition 5 . Let A be an ABW and let w be an infinite word. We define the type ofw 
w.r.t. A as the set typej4{w) = accepts w}. 

The following lemma is a direct consequence of Theorem 1 . 

Lemma 3. Let Af be full automaton, and let {Q\, Q27 Q3, Q4) and {Q'l, Q'2, Q3, Q4) 
be states of Af. Then, 
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- //Qi = q; then 

- //-Qi ^ g; then 



) n L(^Wi"< 32 .Q 3 .QD) ^ I 



Lemma 3 and Theorem 1 imply that the first element of the states of character- 
izes a distinct language. 

Definition 6. The language ofQi is defined as L(Qi) = 



3.2 The Local Transition System 

As observed above, it is sufficient to look at Qx in order to determine the language 
of a state of Af. Recall that L{Qi) = We observe that if there 

exists a transition ((Qi, Q2, Qs, Q4), o', (<5i, Q2; Q37 Q4)) ^f’ then for every state of 

the form (Qi, g2, Ys, Y4) there exists a state {Q'l, Q'2, Y^, Y^) such that the transition 
((Qi,Q2, Y3, Y4), a, (g'l, g' , Fg', Y')) is in Sf. 

These observations imply that there are some local relationships between the lan- 
guages of the states of Af. Indeed, if a word w is in T((Q[, Q'2, Q^, Q^)) then for 
the word a ■ w that is in L{{Qi, Q2, Q3, Q4)), there exists a state of the form {Q'x, Q'2, 
Y3, Y4) that is in Sf{{Qi,Q2, Q3, Q4), o). Thus, we can say that there exists a transition 
on a from L{Qi) to L{Q'fi). The local transition system captures these relationships. 

Definition 7. Given an AWW A = {Y, S, 6 , F) we define its local transition system as 
r_4 = {S, St, St) where 

— St is the set of subsets of S and St is a function from St to 2 ^'^. 

- A state Q' is in St{Q, o) ifQ' |= AggQ( 5 (s, a) and {S \ Q') |= As^qS{s, a). 

Example 1 . We now present an example of a full automaton and a local transition sys- 
tem. The example is presented at Figure 1 . For simplicity we use a deterministic au- 
tomaton A. The hgure shows ^’s dual automaton A, the full automaton Af, and the lo- 
cal transition system Ta- Note that A has F = S, thus for every state {Qi, Q2,Qs, Q4) 
of we have Q4 = (d. For this reason, and since g2 is always equal to S' \ gi, we 
only write the sets Qi and Q3 inside the states. □ 

The dehnitions of the full automaton and the local transition system implies the follow- 
ing lemma: 

Lemma 4. Let A be an AWW and let Af and Ta be its full automaton and local tran- 
sition system respectively. Let {Qi, Q2, Q3, Q4) be a state ofAf, and let a be a letter 
in E. Then, for every state Q'4 we have that Q'^ is in St{Qi, o) iff there exists a state of 
the form (Qi, g^, Q3, Q4) < 5 /((Qi, Q2, Q3, Q4), cr). 

The proof of Lemma 4 is straightforward from the definitions of A f and Ta. In partic- 
ular for every state (gi, Q2,Qz, Q4) and inhnite word w we have that ^('3i>Q2,Q3.Q4) 
has a run on w iff T^^'^ has a run on w. However, we do not dehne accepting condi- 
tions for the local transition system. Thus, it is possible that has a run on w, but 

j^iQi,Q 2 ,Q 3 ,Qi) accepting run on w. 
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Fig. 1. An example of a full automaton Af and a local transition system Ta- 



Lemma 5. Let Ta be a local transition system, and let Q, Q' and Q" be states o/Ta- 
Let a be a letter in S. If Q” € St{Q' , cr) and Q” € 5t{Q, cr), then Q = Q' . 

When a transition system satisfies the property shown in Lemma 5, we say that the 
transition system is reverse deterministic. 

4 Verifying Markov Chains 

In this section we construct a product Gm,A of the Markov chain fA and the local 
transition system Ta- We show that the problem of checking whether Pm{L{A)) = 1, 
can be reduced to checking for a state {x, Q) of G whether the probability of L{Q) n 
X ■ is positive. Then, we show how to use the full automaton to solve this problem. 

Definition 8. Let A be an AWTK Ta be A ’s local transition system, and Aibea Markov 
chain. Afe define the graph Gm,A cis having vertex set (x, Q) such that x is a state of 
Ai and Q is a state ofTA- An edge {x, Q) {x' , Q') is included in Gm,A if Ai has a 
transition x ^ x' and (Q, x, Q') is a transition in Ta- 

When A and Ai are clear from the context, we write G instead of Gm.A- Lemma 5 
implies that for every three states (x, Q), (x', Q'), and (x", Q"), if there is a transition 
from (x, Q) to (x", Q”) and there is a transition from (x', Q') to (x", Q"), then x x' . 
We say that G is reverse deterministic. 
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Example 2. We present in Figure 2 two Markov chains M.\ and Ad 2 - We assume that 
the initial probability for each state in the Markov chains is i. The figure also presents 
the products G\ and G 2 of Adi and Ad 2 respectively, with the local transition system 
Ta from Example 1 . □ 




Fig. 2. Two Markov chains Mi and M 2 , and the graphs they impose Gi and G 2 . 



Every infinite path in G projected on the first component gives a path in Ad . Conversely, 
every path of Ad is the projection of at least one path in G. In fact, let w = xq • tci • . . . be 
a path of Ad . Eor each j , let Qj be the type of the suffix Xj-Xj+iXj +2 ■ ■ • • Then for each 
j there is a transition (Qy, Qy+i) in St and thus an edge (xj,Qj) (a^y+i , Qj+i) 
in G. We call this path the augmented path corresponding to w. 

Definition 9. For a state (x, Q) ofG we denote by P{x, Q) the probability that a path 
that starts in state x has type Q, namely P{x, Q) = Pm{-M.^^^ has type Q). We call 
(x, Q) probable if P{x, Q) > 0. 

The importance of the probable states is demonstrated in the following two lemmas. 

Lemma 6. Pm{L{Q)) 

Let X be a state of a Markov chain with P/(x) > 0. Then for every state (x, Q) we have 
that if (x, Q) is probable, then Pm{L{Q)) > 0. Thus we conclude Lemma 7. 

Lemma 7. Let s be a state of an AWW A and let M be a Markov chain. Then, 
Pm{L(s)) < 1 iff there exists a state x of M and a set Q Q S such that Pi{x) > 0, 
s ^ Q and the state (x, Q) is probable. 

Thus, in order to determine whether Pm{L{sq)) = 1, it is enough to determine the 
probable states of G. In the rest of this section we show how to identify the probable 
states. We define H as the restriction of G to the probable states. 
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Example 3. Look again at Figure 2. It is easy to see that given that a path of i starts at 
state a, the path is of the forma((6a) + a)“ with probability 1. a((6o) + a)‘^ is contained 
in L({1, 2}) U {(a6)“}, since P^^((a6)‘^) = 0, we have P{a,{l, 2}) = 1. Similarly, a 
path of All that starts at b is with probability one in L({1}), thus, P{b, {!}) = 1. This 
implies that in Gi, H is the subgraph induced by the states (a, {1, 2}) and (b, {!}). On 
the other hand, looking at AI 2 , we see that a path that starts at a is of the form (a6)“ 
and a path that starts at b is of the form (ba)^. Thus, in G 2 , H is the subgraph induced 
by the states (a, {!}) and (5, 0). □ 

We start with the following observation. Partition the language L{Q) according to the 
hrst letter of a word and the type of the suffix that starts from the second letter. Then, 
for every state (x, Q) of G we have P{x, Q) = Y,(x,q)^(x' ,Q') Pt{x, x') ■ P{x', Q'). 
Note that if {x, Q) {x' , Q') is an edge of G, then Pt{x, x') > 0 and thus P(x, Q) > 
Pt{x, x') ■ P{x' , Q'). Hence, if (x' , Q') is probable then all its ancestors are probable. 
This implies that it is sufficient to identify the BSCCs of H and then to construct the 
set of probable states using backward reachability. 

Let G be an SCC of G. If it contains some probable state {x, Q), then since all the 
states in G are ancestors of {x, Q), all states in G are probable. That is, either G is an 
SCC of iF or G n iF = 0. Recall that every path in G projects to a path in A1 . So the set 
of hrst components of all members of G are in the same SCC, say K, of j\4, which is 
the SCC of Ai containing x. We say that G corresponds to K. Note that distinct SCC’s 
of G may correspond to the same K. 

Theorem 2 characterizes the SCCs of G which are the BSCCs of P[. Before we 
present the theorem we need the following notation. For a tuple E = (Fli, E 2 , ■ ■ ■ En), 
we dehne TTi{E) = Ei to be the i’th element in E. This notation is extended naturally 
to sequences of tuples. 

Definition 10. A finite path pc in G is fulhlling if there exists a path pf in A f such 
that 7T2 {pg) = tti{pf), the first state of pf is of the form (Qi, Q 2 , (Qi \ A), {Q 2 \ F)), 
and the last state of pf is of the form {Q'l, Q' 2 , 0, 0)- 

Theorem 2. Let G be an SCC ofG. Then G is a BSCC of El iff it satisfies the following 
conditions: 

L G corresponds to a BSCC K of M.. 

2. Every finite path of K is a projection of a path in G. 

3. G contains a fulfilling path. 

5 Algorithms 

In Figure 3 we present the algorithm that determines for an AWW A and a Markov chain 
Ai whether Pm{L{A)) = 1. An extension for exact probability is presented in full 
version. Theorem 2 implies that the algorithm mark the BSCCs of El. Thus, B is the set 
of probable states. Lemma 7 implies that the algorithm returns true iff Pm{L{A)) = 1. 
Finding SCCs in G that correspond to BSCCs in A4, and doing backward reachability 
can be done in time linear in the size of G. The most complex part of the algorithm is 
to identify, SCCs G of G that satisfy: 
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Inputs: Markov chain M = {X,Pi,Pt), AWW A = {S , S, sq,5, F) . 

Construct the full automaton Af of A. 

Construct the local transition system Ta and the graph G. 

Mark all SCCs C of G that satisfy: 

1 . C corresponds to a BSCC K of A4 . 

2. Every finite path of Tf is a projection of a path in C. 

3. C contains a fulfilling path. 

Construct the set B of all states of G from which the marked 
SCCs are reachable. 

return true iff for every state (x,Q) £ B , if Pi{x)>Q, then so £ Q . 
Fig. 3. The model-checking algorithm. 



1. C corresponds to a BSCC K of A4. 

2. Every finite path of ilT is a projection of a path in C. 

3. C contains a fulfilling path. 

The following lemma is proved in [10]. The only property of G that they use is that G 
is reverse deterministic. 

Lemma 8. Let C be an SCC of G that corresponds to an SCC K o/ Ad. Then the 
following are equivalent: 

1. Every finite path in K is a projection of a path in G. 

2. No other SCC ofG corresponding to K is an ancestor of C. 

Lemma 8 implies that the second task is equivalent to checking whether there is no 
ancestor SCC of C that corresponds to K. This check can be easily done while scanning 
the SCCs of G. 

Example 4. In G\ at Figure 2 there are two SCC’s that correspond to the single BSCC 
of Adi. The SCC of (a, {1,2}) and (&,{!}) does not have ancestors and contains 
the fulfilling path (a, {1, 2}), (a, {1, 2}), (a, {1, 2}) that corresponds to the path 
({1, 2}, {1, 2}), ({1, 2}, {1}), ({1,2}, 0) in Af, thus, this SCC is the BSCC of H. In 
G 2 there are two SCCs that correspond to the single BSCC of Ad 2 and neither of them 
have an ancestor. However, only the SCC of (a, {1}) and (6, 0) has a fulfilling path, 
thus it is the BSCC of HL\ 



We now explain how to check whether an SCC C of G contains a fulfilling path. We 
construct the product G/ = Ad x Af, similarly to the construction of G. 

Definition 11. Let A be an AWTk Af be A’ s full automaton, and M be a Markov 
chain. We define the full graph G/ as having vertex set {x,{Qi,Q2,Qz,Qa)) such 
that X is a state of Ad and (Qi, Q2, Qs, Q4) is a state of Af. An edge 
{x,{Qi,Q2,Qa,Qa)) {x' ,{Q' i,Q'2iQ'ziQ'a)) ^s included in Gf if M. has a tran- 
sition X ^ x' and (Q}, Q21 Q31 Q4) '^/((Qi: Q2, Qz, Qa),x). 



Lemma 9. An SCC G of G contains a fulfilling path iff there exists a path 

{xo,{Qi,Q 2 .Ql.QA))AxiAQ\^Ql^QhQ\))i---Axn,{QiiQ 2 ,^^^)) Gf such 
that the path (xq, Q^), {x\,Q\), . . . (x„, Q") is contained in G, Q 3 = Qi\ F, and 

Ql = Ql\ F. 
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Complexity. Finding SCCs in G that correspond to BSCCs in A4, and doing backward 
reachability can be done in linear time and polylogarithmic space in |G|. Constructing 
BSCCs of Ai can be done in time linear in \A4\, identifying SCCs of G that corre- 
spond to these BSCC can be done in time linear in |G|. Marking SCCs that do not have 
ancestors that correspond to the same BSCC in AA can also be done in time linear in 
|G|. Checking that an SCC of G contains a fulfilling path can be done in time linear in 
\Gf\, simply by scanning G / and G in parallel, thus, the algorithm can be implemented 
in time linear in |A1 x Af\. Since the size of Af is we have that the time 

complexity of the algorithm is \AA \ ■ 

As for space complexity we show that algorithm works in space polynomial in |^| 
and polylogarithmic in \AA\. We rewrite the conditions of Theorem 2, Lemma 7, and 
Lemma 8 as follows: Pm{L{A)) < 1 iff there exists a probable state (xo,Qo) such 
that So ^ Q and F/(xo) > 0. This is true iff (xg, Qo) reaches a state (x, Q) that is in a 
BSCC of ff, So ^ Qo. and Pi{xo) > 0. That is 

1. {x, Q) is reachable from a state (xq, Qo) such that Pi{xo) > 0 and sg ^ Qo- 

2. X is in a BSCC of A4 (Theorem 2, (1)). This condition is equivalent to the follow- 
ing: for every state x' of A4 we have that if there exists a path in A4 from x to x' 
then there exists a path from x' to x. 

3. No other SCC of G that corresponds to the SCC of x in A1 is the ancestor of the 
SCC of (x, Q) (Lemma 8). This condition is equivalent to the following: for every 
state (x', Q'), if there exists a path from (x', Q') to (x, Q), then either there exists 
a path from (x, Q) to (x', Q'), or there is no path from x to x'. 

4. The SCC of (x, Q) contains a fulfilling path (Theorem 2, (3)). By Lemma 9 this 
condition is equivalent to the following: there exists a path in G/ from a state 
(x', (Qi, Q 2 , Qi \ F, Q 2 \ F)) to a state (x", (Q", Q 2 , 0, 0)) such that the pro- 
jection of the path on G is contained in the SCC of (x, Q). This condition is 
equivalent to: there is a path from a state (x',(Qi,Q' 2 ,Qi \ F,Q' \ F)) to a 
state (x", (Q", Q 2 , 0, 0)) in G/ and there are paths from (x, Q) to (x", Q"), from 
(x", Q") to (x', Q'), and from ((x', Q') to (x, Q) in G. 

In [25] it is shown that checking whether there is a path from one state to another in a 
graph with n states requires log‘^{n) space. This implies that the conditions above can 
be checked in space 0(Zop^(|G/|)) = = 0{log‘^{\A4\)+log{\A4\)- 

\A\ + \A\^) = 0{W{\M\) + \A\-^). 

6 Concluding Remarks 

We presented here an optimal solution to the general problem of linear-time probabilis- 
tic model checking with respect to w-regular specifications, expressed by alternating 
automata. Beyond the interest in the problem itself, our solution is interesting from a 
theoretical perspective, since the concept of full automaton may have other applications. 
More work is needed in reducing our result to practice. One direction is to extend the 
ProbaTaf system, which currently handles LTL specifications of Markov chain [11], 
to ABW specifications. Another, is to combine the symbolic approach to alternating 
automata [20] with the symbolic approach to probabilistic model checking [3]. 
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Abstract. We propose a new statistical approach to analyzing stochas- 
tic systems against specifications given in a sublogic of continuous 
stochastic logic (CSL). Unlike past numerical and statistical analysis 
methods, we assume that the system under investigation is an unknown, 
deployed black-box that can be passively observed to obtain sample traces, 
but cannot be controlled. Given a set of executions (obtained by Monte 
Carlo simulation) and a property, our algorithm checks, based on statisti- 
cal hypothesis testing, whether the sample provides evidence to conclnde 
the satisfaction or violation of a property, and computes a quantitative 
measure (p-value of the tests) of confidence in its answer; if the sample 
does not provide statistical evidence to conclude the satisfaction or vio- 
lation of the property, the algorithm may respond with a “don’t know” 
answer. We implemented our algorithm in a Java-based prototype tool 
called VeStA, and experimented with the tool using case studies ana- 
lyzed in [15]. Our empirical results show that our approach may, at least 
in some cases, be faster than previous analysis methods. 



1 Introduction 

Stochastic models and temporal logics such as continuous stochastic logic 
(CSL) [1, 3] are widely used to model practical systems and analyze their per- 
formance and reliability. There are two primary approaches to analyzing the 
stochastic behavior of such systems: numerical and statistical. In the numerical 
approach, the formal model of the system is model checked for correctness with 
respect to the specification using symbolic and numerical methods. Model check- 
ers for different classes of stochastic processes and specification logics have been 
developed [8,14,13,4,5,2,6]. Although the numerical approach is highly accu- 
rate, it suffers from being computation intensive. An alternate method, proposed 
by Younes and Simmons [16], is based on Monte Carlo simulation and sequen- 
tial hypothesis testing. Being statistical in nature, this approach is less accurate 
and only provides probabilistic guarantees of correctness. The approach does 
not assume knowledge of a specific formal model for the system being analyzed, 
and therefore can be potentially applied to analyzing complex dynamical sys- 
tems such as generalized semi-Markov processes (CSMPs), for which symbolic 



R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 202-215, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Statistical Model Checking of Black-Box Probabilistic Systems 203 



and numerical methods are impractical. However, the Younes and Simmons’ ap- 
proach assumes that the system is controllable (not black-box) and can be used 
to generate sample executions from any state on need basis. 

Both the numerical and the current statistical methods suffer from several 
serious drawbacks when it comes to analyzing practical systems. First, modern 
day systems are large heterogeneous, and assembled by integrating equipment 
and software from diverse vendors, making the construction of a formal model of 
the entire system often impossible and thus limiting the feasibility of numerical 
and symbolic methods. Second, for large network systems, meaningful experi- 
ments may involve dozens or even thousands of routers and hosts, which would 
mean that the system needs to be deployed before reasonable performance mea- 
sures can be obtained. However, once they are deployed, such systems cannot be 
controlled to generate traces from any state, making it impossible to generate 
execution samples on a need basis as is required by the Younes et. aFs statistical 
approach. 

Despite the success of current analysis methods [10,13,12,15,8], there is 
therefore a need to develop methods to analyze stochastic processes that can be 
applied to deployed, unknown “black-box” systems (systems from which traces 
cannot be generated from any state on need) In this paper we address these 
concerns by proposing a new statistical approach to model checking. Like in 
Younes et. al’s approach, discrete event simulation methods are used to obtain a 
set of sample executions; however, unlike their method we assume no control over 
the set of samples we obtain. We then test these samples using various statistical 
tests determined by the property to be verified. Since we assume that the sam- 
ples are generated before testing, our algorithm relies on statistical hypothesis 
testing, rather than sequential hypothesis testing. Our inability to generate sam- 
ples of our choosing and at the time of our choosing ensures that our approach 
differs from the previous statistical approach in one significant way: unlike the 
previous approach where the model checker’s answer can be guaranteed to be 
correct within the required error bounds, we instead compute a quantitative 
measure of confidence (the p- value, in statistical testing terminology [11]) in 
the model checker’s answer. Our algorithm computes the satisfaction of the de- 
sired formula by recursively determining the satisfaction of its subformulas (and 
the confidence of such an answer) in the “states” present in the sample. This 
presents a technical challenge because our algorithm, being statistical in nature, 
may be uncertain about the satisfaction of some formulas based on the given 
samples. The algorithm needs to compute useful answers (as far as possible) 
even in the presence of uncertain answers about the satisfaction of subformulas. 
We overcome this challenge by interpreting such “don’t know” answers in an 
adversarial fashion. Our algorithm, thus checks if the sample provides evidence 



^ We assume that the samples generated from the system by discrete event simulation 
have information about the “system state”. We, however, make no assumptions 
about the transition structure of the underlying system, nor do we assume knowledge 
about the transition probabilities; the system under investigation is black-box in this 



sense. 
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for the satisfaction or violation of a property and the confidence with which such 
an assertion holds, or gives up and says “don’t know.” The algorithm that we 
propose suffers from one drawback when compared with the previous statistical 
approach. Since we analyze a fixed sample, we will get useful answers only when 
there are sufficient samples for each “relevant state.” Therefore our method is 
likely to work well only when a finite set of samples is enough to provide sufficient 
information about relevant states. Examples of systems we can successfully an- 
alyze are Continuous-time Markov Chains (CTMCs) or systems whose relevant 
states are discrete, while we are unlikely to succeed for GSMPs in general. 

A closely related approach to analyzing stochastic systems based on Monte 
Carlo simulation is by Herault et. al. [7], which can model-check discrete-time 
Markov chains against properties expressed in an expressively weak logic ( “pos- 
itive LTL”). 

We have implemented the whole procedure in Java as a prototype tool, called 
VeStA (VErification based on Sxatistical Analysis)^. We have experimented 
with VeStA by applying it to some examples that have been previously analyzed 
in [15] and the results are encouraging. However, we suspect that VeStA would 
require a lot more space, because it stores the entire collection of samples it is 
analyzing. Even though space was not a problem for the examples we tried, we 
suspect that it may become an issue later. 

The rest of the paper is organized as follows. Section 2 defines the class of 
systems we analyze and the logic we use. In Section 3, we present our algorithm 
based on statistical hypothesis testing in detail. Details about VeStA and our 
case studies are presented in Section 4. In Section 5, we conclude and present 
possible directions for future research. 

2 Preliminaries 

2.1 Sample Execution Paths 

The verification method presented here can be independent of the system model 
as long as we can generate sample execution paths of the system; the model 
and its definition are very similar to [16]. We will assume that the system being 
analyzed is some discrete event system that occupies some state s G S, where S 
is the set of states of the system. The states in S that can effect the satisfaction 
of a property of our interest are called the “relevant states.” Note that the num- 
ber of relevant states may be quite small compared to the whole state space of 
the system. For example, for a formula (j>i U-*(j )2 the states that can be reached 
within time t are relevant. We assume that each relevant state can be uniquely 
identified and that information about a state’s identity is available in the exe- 
cutions. Since samples are generated before running our analysis algorithm, we 
require that a Monte Carlo simulation is likely to generate a sample that has 
enough “information” about the relevant states; if not our algorithm is likely to 
say that it cannot infer anything about the satisfaction of the property. 

^ Available from http://osl.cs.uiuc.edu/~ksen/vesta/ 
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We assume that there is a labeling function L that assigns to each state a set 
of atomic propositions (from among those appearing in the property of interest) 
that hold in that state; thus L : S ^ 2^^ , where AP is a set of relevant atomic 
propositions. The system remains in a state s until an event occurs, and then 
proceeds instantaneously to a state s' . An execution path that appears in our 
sample is thus a sequence 



to ti t2 
7T = So ^ Si ^ S2 ^ • 



where Sq is the unique initial state of the system, is the state of the system 
after the tth event and ti is the time spent in state Si. If the fcth state of this 
sequence is absorbing, then Si = Sk and ti = oo for all i > k. 

We denote the zth state in an execution tt by 7 t[z] = Si and the time spent 
in the zth state by <5(7 t, z). The time at which the execution enters state 7r[z + 1] 
is given by t(7t, z -b 1) = X)j=o The state of the execution at time t (if 
the sum of sojourn times in all states in the path exceeds t), denoted by 7r(t), is 
the smallest z such that t < t{tt, i + 1). We let Path{s) be the set of executions 
starting at state s. We assume Path{s) is a measurable set (in an appropriate 
cr-field) and has an associated probability measure. 



2.2 Continuous Stochastic Logic 

Continuous stochastic logic (CSL) is introduced in [1] as a logic to express prob- 
abilistic properties of continuous time Markov chains (CTMCs). In this paper 
we adopt a sublogic of CSL (excluding unbounded untils and stationary state 
operators) as in [16]. This logic excludes the steady-state probabilistic opera- 
tors and the unbounded until operators. We next present the syntax and the 
semantics of the logic. 

CSL Syntax 

0 ::= true \ a e AP \ ^(f> \ 4> A 4> \ Ptx>p{i>) 

V> ::= 0W-V I X0 



where AP is the set of atomic propositions, ixi G {<, <, >, >}, p G [0, 1], and 
t G K>q. Here 4> represents a state formula and 'ip represents a path formula. The 
notion that a state s (or a path tt) satisfies a formula <j) is denoted by s ^ ^ (or 
TT \= <p), and is defined inductively as follows: 

CSL Semantics 



s 1= true 
S ^ Ptxp{lp) 

TT 1= X0 

TT \= (j)\ ht-'(f)2 



s [= a iff a G AP{s) 

iW s 4> s 1= 01 A 02 iff s 1= 01 and s |= 02 

iff Proh{TT G Path{s) | tt [= 0} cxi p 
iff r(7T, 1) < 00 and 7 t[1] |= 0 

iff 3a; G [0, t], (7r(x) |= 02 and Vy G [0, x). 7r(y) j= 0i) 



A formula is satisfied by a state s if Pro& [path starting at s satisfies 

0] 1 X 1 p. To define probability that a path satisfies 0 we need to define a cr-algebra 
over the set of paths starting at s and a probability measure on the corresponding 
measurable space in a way similar to [5] . The path formula X0 holds over a path 
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if (f) holds at the second state on the path. The formula <j)i U-^<f )2 is true over a 
path 7T if (f >2 holds in some state along tt at a time x G [0,t], and (j) holds along 
all prior states along tt. This can also be recursively defined as follows: 

Sat{si TTi+l, 4>1 

= (t > 0) A {Sat{si, <j) 2 ) V {Sat{si, (pi) A Sat{ni+i, (pi U-*~*'(p 2 ))) (1) 

where Sat{s, <p) (or Sat{7r, ip)) are the propositions that s \= (p {or tt \= ip) . This 
definition will be used later to describe the algorithm for verification of (piU-*cp 2 
formula. 

3 Algorithm 

In what follows we say that s \=^ (p if and only if our algorithm (denoted by A) 
says that (p holds at state s. Since the algorithm is statistical, the decision made 
by the algorithm provides evidence about the actual fact. In our approach, we 
bound the strength of this evidence quantitatively by a number in [0, 1] which 
gives the probability of making the decision given that the decision is actually 
incorrect. In statistics, this is called the p-value of the testing method. We denote 
it by a ^ and write s |= (() if the state s actually satisfies p. 

We assume that we are given a set of finite executions. The length of a finite 
execution path must be large enough so that all the bounded until formulas can 
be evaluated on that path. Given a set of sample execution paths starting at the 
initial state and a formula p, the algorithm works recursively as follows: 

verify AtState{p, s){ 

if cache contains {p,s) return cache{p, a)\ 

else if (^ = true then (z, a) ^ (1, 0.0); 

else p = a € AP then {z, a) ^ verify Atomic{a, s)\ 

else if (^ = ^p' then {z, a) ^ verifyNot{-ip' , s); 

else 1± p = pi /\ p 2 then (z, a) <— verifyAnd{pi A p 2 ,s)-, 

else if = PtxipiP) then (z, a) ^ verifyProb{Vc^p{p) , s); 

store {s,p) (z,a) in cache-, 

return (z, a); 

} 

where verifyAtState returns a pair having 0, 1, or undecided corresponding to 
the cases s p, s p, or A cannot decide respectively, as the first compo- 
nent, and p- value for this decision as the second component. To verify a system 
we check if the given formula holds at the initial state. Once computed, we store 
the decision of the algorithm for p at state s in a cache to avoid recomputation. 
The result of our hypothesis testing can be shown to hold in presence of caching. 
This results in a significantly faster running time and a reduction in the sam- 
ple set size. In the remainder of this section we define the various procedures 
verifyAtomic, verifyAnd, verifyNot, and verifyProb recursively. 

The key idea of the algorithm is to statistically verify the probabilistic oper- 
ator. We present the corresponding procedure verifyProb below. 

This should not be confused with the Type I error which is also denoted by a. 
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3.1 Probabilistic Operator 

We use statistical hypothesis testing [11] to verify a probabilistic property (j) = 
'Pc^pW at a given state s. Without loss of generality we show our procedure for 
(j) = V>p{tp). This is because, for the purpose of statistical analysis, P<p('i/') is 
essentially the same as ^P>i_p(f/)) and < (or >) is in effect the same as < (or 
>). Let p' be the probability that V' holds over paths starting at s. We say that 
s 1= V>p{ip) if and only ii p' > p and s V>p{'4>) if and only if p' < p. We 
want to decide either s |= V>p{tp) or s V>p{tp). Accordingly we set up two 
experiments. In the first experiment, we use sample execution paths starting 
at s to test the null hypothesis Hq : p' < p against the alternative hypothesis 
Hi' p' > P- In the second experiment, we test the null hypothesis Hq: p' > p 
against the alternative hypothesis Hi: p' < p . 

Let the number of sample execution paths having a state s somewhere in the 
path be n. We can treat the portion of all these paths starting at s (suffix) as 
samples from Path{s). Let Ai, A 2 , . . . , A„ be a random sample having Bernoulli 
distribution with unknown parameter p' € [0, 1] i.e. for each i G [1, n], Prob[Xi = 
1] = p' ■ Then the sum Y = Xi+ X 2 -I- . . . + A„ has binomial distribution with 
parameters n and p' . We say that Xi, an observation of the random variable 
Xi, is 1 if the sample execution path satisfies ip and 0 otherwise. In the 
first experiment, we reject Hq: p' < p and say s V>p{'ip) if > p and 
calculate the p- value as a = Proh[s \=_a. (p \ s ^ cp] = Prob[Y > ^Xi \ p' < p\. 
Note we do not know p' . Therefore, to calculate a we use p which is an upper 
bound for p' . If we are not able to reject Hq in the first experiment then we do 
the second experiment. In the second experiment, we reject Hq: p' > p and say 
s ’^>p(V') if < P ^nd calculate the p- value as a = Prob[s ^a I s h 
(p] = Prob[Y < ^ Xi \ p' > p\. Thus, a smaller a represents a greater confidence 
in the decision of the algorithm A. 



3.2 Nested Probabilistic Operators 

The above procedure for hypothesis testing works if the truth value of ip over 
an execution path determined by the algorithm is the same as the actual truth 
value. However, in the presence of nested probabilistic operators in ip, A cannot 
determine the satisfaction of ip over a sample path exactly. Therefore, in this 
situation we need to modify the hypothesis test so that we can use the inexact 
truth values of ip over the sample paths. 

Let the random variable A be 1 if a sample execution path tt actually satisfies 
ip in the system and 0 otherwise. Let the random variable Z he 1 for a sample 
execution path tt if tt \=^ ip and 0 otherwise. In our procedure we cannot get 
samples from the random variable A; instead our samples come from the ran- 
dom variable Z . Let A and Z have Bernoulli distributions with parameters p' 

^ While handling nested probabilistic operators, these experiments will no longer be 
symmetric. Moreover, setting up these two experiments is an alternate way of getting 
at a conservative estimate of what Type II error (/3 value) may be. 
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and p" respectively. Let Zi, Z 2 , ■ ■ ■ , Zn be a random sample from the Bernoulli 
distribution with unknown parameter p" e [0, 1]. We say that Zi, an observation 
of the random variable Zi, is 1 if the algorithm says that the z**' sample execution 
path satisfies ip and 0 otherwise. 

For the formula (p = V>p(ip) we calculate regions of indifference, denoted by 
the fractions (5i and <52, based on the algorithm’s decision for the satisfaction 
of Ip over the different sample paths. Depending on the values of i5i and 62 we 
set up the two experiments. In the first experiment, we test the null hypothesis 
Ho' p" < P + 5\ against the alternative hypothesis Hi: p” > p + 5\. If we get 
> p + 5\ we reject Hq and say s V>p{ip) with p- value a = Prob[s 
(p \ s ^ (p] = Prob\^Zi > ^ Zi \ p” < p + Si]. If we fail to reject Pfo we 
go for the second experiment, in which the null hypothesis Hq : p" > p — 62 
against the alternative hypothesis Hi: p” < p — 52 - We reject Plo and say that 
s V>p{ip) if < p — S 2 and calculate the p- value as a = Prob[s (p \ 
s ^ (/)] = Prob[^Zi < ^Zi Ip" > p — 62 ]. Otherwise, we say the algorithm 
cannot decide. 

We now show how to calculate i5i and 62 - Using the samples from Z we can 
estimate p". However, we need an estimation for p' in order to decide whether 
(p = P^p{ip) holds in state s or not. To get an estimate for p' we note that the 
random variables X and Z are related as follows: 

Prob[Z = 0 I X = 1] < a' Prob[Z = 1 | X = 0] < a' 

where a' is the p-value calculated while verifying the formula ip. By elementary 
probability theory, we have 

Prob[Z = 1] = Prob[Z = 1 | X = 0]Prob[X = 0] + Prob[Z = 1 | X = l]Pro6[X = 1] 



Therefore, we can approximate p" = Prob[Z = 1] as follows: 

Prob[Z = 1] < a' {1 — p') + l.p' = p' + (1 — p')a' 

Prob[Z = 1] > Prob[Z = 1 | X = l]Profc[X = 1] > (1 - a')p' = p' - a'p' 

This gives the following range in which p" lies: 

p — a p < p < p [1 — p )a 



Hence, Prob\Y, Z,, > \ P' < P] < Prob[YZ^ > Y^i \ P' = P\ < 

Prob[YZi > Y^i \ P" = P ~ <^'p] which gives 62 = a'p. Similarly, we get 
(5i = (1 -p)a'. 

Note that the p-value obtained while verifying ip over different sample paths 
are different. We take the worst or the maximum of all such p- values as a' . 
Moreover, since A can say true, false, or cannot decide, note that the A may not 
have a definite true or false answer along certain sample paths. For such paths 
the algorithm will assume the worst possible answer in the two experiments. For 
the first experiment, where we check whether > p-|-5i, we take the answers 
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for the sample paths for which A cannot decide as false and the p- value as 0. 
For the second experiment we consider the answer for the undecided paths to 
be true and the p-value as 0. This allows us to obtain useful answers even when 
the sample does not have enough statistical evidence for the satisfaction of a 
subformula. 

Thus we can define the procedure verifyProb{'P>p{4’) , s) as follows: 
verifyProb{P>p{ip),s){ 

zsum„iin ^ 0; zsummax ^ 0; a' ^ 0.0; n ^ 0; 
for each sample path tt starting at s{ 

{z,a") ^ verifyPath{ip, tt)-, 

if a = undecided then {zsummin ^ zsummin + 1; zsummax <— zsummax + 0;} 
else {zsummin ^ zsummin + t; zsummax ^ zsummax + t;} 
a' <— max{a' , a”); n <— n -I- 1; 

} 

if zsurnmaxjn > p + {1 — p)a' then 

return (1, Prob[J2 Zi > zsummax \ p" = p + {1 — p)a']); 
else if zsurnminjn < p — pa then 

return (0, Prob[J2 Zi < zsummin \ p” = p — pa']); 
else return {undecided, 0.0); 

} 

One can calculate Proh'f^ Zi > zsummax \ p" = P+{^—p)of] (or Proh^f^ Zi > 
zsummin \ p” = P ~ pci']) by noting that ^ Zi has binomial distribution with 
parameters p -I- (1 — p)a' (or p — pa') and n. 



3.3 Negation 

For the verification of a formula ~^(j) at a state s, we recursively verify cj) at state 
s. If s |=_4 4> with p- value a we say that s -^(j> with p- value Prob[s ~^(j> \ 
s 1= = Prob[s |=yi 4> \ s ^ (p] = a. Similarly, if s p with p-value a then 

s 1=^ —op with p-value a. Otherwise, if A cannot answer the satisfaction of p at 
s, we say that A cannot answer the satisfaction of —•p at s. Thus we can define 
verifyNot as follows: 

verifyNot{^p' , s){ 

(z,a) <— verifyAtState{p' ,s); 

if 2 ; =undecided then return {undecided, 0.0); else return (1 — 2 , a); 

} 

3.4 Conjunction 

To verify a formula p = p\ f\ p 2 & state s, we verify p\ and p 2 at the state s 
separately. Depending on the outcome of the verification of p\ and p 2 , A decides 
for p as follows: 
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1. If s 4>i and s \=a 4>2 with p- values a\ and a 2 respectively, then s \=a 4>- 
The p- value for this decision is Prob[s \=_a. (f>i A (j)2 \ s <j)\ A Note that 
s <f)i A4>2 holds in three cases, namely 1 ) s (j)i and s \= (j)2, 2 ) s \= <j)i 
and s ^ (p2, and 3) s ^ </>i and s ^ 02- Thus, the p- value for the decision 
of s 1=^ (j) can be taken as the maximum of the p-values in the above three 
cases which is max{ai,a2). 

2. If s 4>i (or s 4>2) and either s |=^ <1)2 or A cannot decide 4>2 at s (or 
either s |=^ (j>i or A cannot decide (j>i at s), then s (j) and the p- value is 
Prob[s (t>i A (j)2 \ s \= (pi and s ^ ^ 2 ] = oi + 0!2 (or «2 + ai)- 

3. If s (pi and s (p2 then s (pi A <p2- The p- value for this decision is 
Prob[s (pi A (p2 \ s \= (pi and s\= (p2]<ai+ « 2 - 

4. Otherwise, A cannot decide (p. 

Thus we can define the procedure verifyAnd as follows: 

verify And[(pi f\(p2,s){ 

{zi,ai) ^ verifyAtState{(pi, s)-, (22,02) <— verifyAtState{(p2,s)\ 

if 2i = 1 and 22 = 1 then return (l,maa;(Q;i, 02)); 

else if 2i = 0 and 22 7^ 0 then return (0, Oi + 02); 

else if 2i 7^ 0 and Z2 ~ 0 then return (0, Oi + 02); 

else 2i = 0 and Z2 ~ 0 then return ( 0 , Oi + 02); 

else return {undecided, 0.0); 

} 

3.5 Atomic Proposition 

In this simplest case, given (p = a and a state s, A checks if s a or not 

by checking if a G AP(s) or not. If a G AP(s) then s (p with p-value 0. 

Otherwise, s (p with p-value 0. 

verifyAtomic{a, s){ 

JA a G AP{s) then return ( 1 , 0 . 0 ); else return ( 0 , 0 . 0 ); 

} 



3.6 Next 

To verify a path formula ^p = 'X.(p over a path tt, A verifies (p at the state 
7 t[ 1]. If 7 t[I] \=yi (p with p-value a then tt |=^ ^p with p-value ex. Otherwise, if 
^[1] '^A (p with p-value a then tt ip with the same p-value. Thus we can 
define verify Path for X(^ as follows: 

verifyPath{^(p, n){ 

return verifyAtState{(p, 

} 

3.7 Until 

Let Ip = (pi U-*(p 2 be an until formula that we want to verify over the path tt. 
We can recursively evaluate the truth value of the formula over the path tt by 
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following the recursive definition ^iven by Equation 1 . Given the truth value and 
the p- value for the formula (piU-* over the suffix 7Ti+i, we can calculate the 
truth value of (j)i U-* (f >2 over the path tt^ by applying the decision procedure for 
conjunction and negation. Observe that the recursive formulation in Equation 1 
can be unrolled to obtain an equation purely in terms of conjuction and negation 
(and without any until formulas); it is this “unrolled” version that is used in the 
implementation for efficiency reasons. 

4 Implementation and Performance 

We have implemented the above algorithm as part of a prototype Java tool 
called VeStA. We successfully used the tool to verify several programs having 
a CTMC model® . The performance of the verification procedure depends on the 
number of samples required to reach a decision with sufficiently small p- value. To 
get a smaller p-value the number of samples needs to be increased. We need a lot 
of samples only when the actual probability of a path from a state s satisfying a 
formula ip is very close to the threshold p in a formula V^p{ilj) whose satisfaction 
we are checking at s. 

To evaluate the performance and effectiveness of our implementation we did 
a few case studies. We mostly took the stochastic systems used for case studies 
in [15]. The experiments were done on a 1.2 GHz Mobile Pentium III laptop 
running Windows 2000 with 512 MB memory®. We did not take into account the 
time required for generating samples: we assumed that such samples come from 
a running system. However, this time, as observed in some of our experiments, 
is considerably less than the actual time needed for the analysis. We generated 
samples of length sufficient to evaluate all the time-bounded until formulas. In 
all of our case studies we checked the satisfaction of a given formula at the ini- 
tial state. We give a brief description of our case studies below followed by our 
results and conclusions. The details for the case studies can be obtained from 
http : / /osl . cs . uiuc . edu/~ksen/vesta/. 

Grid World: We choose this case study to illustrate the performance of our 
tool in the presence of nested probabilistic operators. It consists of an n x n 
grid with a robot located at the bottom- left corner and a janitor located at the 
top-right corner of the grid. The robot first moves along the bottom edge of a 
cell square and then along the right edge. The time taken by the robot to move 
from one square to another is exponentially distributed. The janitor also moves 
randomly over the grid. However, either the robot or the janitor cannot move to 
a square that is already occupied by the other. The robot also randomly sends 
a signal to the base station. The underlying model for this example is a GTMG. 
The aim of the robot is to reach the top-right corner in time T\ units with prob- 
ability at least 0.9, while maintaining a minimum 0.5 probability of communica- 

® We selected systems with CTMC model so that we can compare our results with 
that of existing tools. 

® [15] used a 500 MHz Pentium III. However, our performance gain due to the use of 
faster processor is more than offset by the use of Java instead of C. 
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Fig. 1. Grid world: verification time and number of samples versus number of states. 



tion with the base station with periodicity less than T 2 units of time. This can be 
specified using the CSL formula V>o. 9 {V>o. 5 {trueU-"’"'^ communicate)U-"'"^ goal) . 

We verified the CSL property for Grid World with n G [1, 100]. The property 
holds only for n G [1, 13]. The state space of the program is 6>(n^). In Fig.l we 
plot the results of our experiment. The graph shows that for n closer to 13 the 
running time and the number of samples required increases considerably to get a 
respectable p- value of around 10“®. This is because at n = 13 the probability that 
V>o. 5 {true U-'^^ communicate) goal holds over an execution path becomes 
very close to 0.9. We found that our graphs are similar to [15]. 

Cyclic Polling System: This case study is based on a cyclic server polling 
system, taken from [12]. The model is represented as a CTMC. We use N to 
denote the number of stations handled by the polling server. Each station has a 
single-message buffer and they are cyclically attended by the server. The server 
serves the station i if there is a message in the buffer of i and then moves on to 
poll the station (i-bl) modulo N. Otherwise, the server starts polling the station 
t-b 1 modulo N. The polling and service times are exponentially distributed. The 
state space of the system is 0{N.2^). We verified the property that “once a job 
arrives at the first station, it will be polled within T time units with probability 
at least 0.5.” The property is verified at the state in which all the stations have 
one message in their message buffer and the server is serving station 1. In CSL 
the property can be written as (mi = 1) ^ V>o. 5 {trueU-'^{s = 1 A a = 0)), 
where mi = 1 means there is one message at station 1, and s = 1 A a = 0 means 
that the server is polling station 1. 

Tandem Queuing Network: This case study is based on a simple tandem 
queuing network studied in [9]. The model is represented as a CTMC which 
consists of a M/C 0 X 2 / 1-queue sequentially composed with a M/M/ 1-queue. We 
use N to denote the capacity of the queues. The state space is 0{N^). We verified 
the CSL property V<^o. 5 {trueU-"’"full) which states that the probability of the 
queuing network becoming full within T time units is less than 0.5. 

The results of the above two case studies is plotted in Fig. 2. The character- 
istics of the graphs for both the examples are similar to that in [15]. However, 
while achieving a level of confidence around 10“®, the running time of our tool 
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Fig. 2. Polling System and Tandem Queuing Network: Running time versus the pa- 
rameter T in CSL formula. 



for these cases is faster than the running time of the tool described in [15]. It 
is important to observe that, unlike the work of [15], VeStA cannot guaran- 
tee that the error of it’s probabilistic answer is bounded; the p-value computed 
depends on the specific sample. 

We could not compare the number of samples for these case studies as they 
are not available from [15]; theoretically sequential hypothesis testing should 
require a smaller sample size than simple hypothesis testing to achieve the same 
level of confidence. While in our case studies we never faced a memory problem, 
we suspect that this may be a problem in very big case studies. We observed 
that, although the state space of a system may be large, the number of states 
that appeared in the samples may be considerably smaller. This gives us hope 
that our approach may work as well for very large-scale systems. 

5 Conclusion and Future Work 

We have presented a new statistical approach to verifying stochastic systems 
based on Monte Carlo simulation and statistical hypothesis testing. The main 
difference between our approach and previous statistical approaches is that we 
assume that the system under investigation is not under our control. This means 
that our algorithm computes on a fixed set of executions and cannot obtain sam- 
ples as needed. As a consequence, the algorithm needs to check for the satisfac- 
tion of a property and compute the p- value of its tests. Since the sample may not 
provide sufficient statistical evidence to conclude the satisfaction or violation of a 
property, one technical challenge is to provide useful answers despite insufficient 
statistical evidence for the satisfaction of subformulas. We implemented our ap- 
proach in Java and our experimental case studies have demonstrated that the 
running time of our tool is faster than previous methods for analyzing stochastic 
systems in at least some cases; this suggests that our method may be a feasible 
alternative. 

Another important challenge is the amount of memory needed. Since we 
store all the sample executions, our method is memory intensive, and though 
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we did not suffer from memory problems on the examples studied here, we sus- 
pect that it will be an issue when analyzing larger case studies. Hence, there 
is a need to design efficient data structures and methods to store and compute 
with a large set of sample executions. We suspect statistical hypothesis testing 
approaches (as opposed to sequential hypothesis testing approaches) might be 
extendible to check liveness for certain types of systems, possibly by extract- 
ing some additional information about the traces. Note that liveness properties 
are particularly complicated by the fact that the operators may be nested. We 
are currently exploring that direction. Finally, it would be interesting to apply 
statistical methods to analyze properties described in probabilistic logics other 
than CSL. 
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Abstract. We propose a compositional specification and verification approach 
based on GSTE (Generalized Symbolic Trajectory Evaluation). There are two 
main contributions. Eirst, we propose a specification language that allows con- 
current properties be described succinctly in a compositional algebraic manner. 
Second, we show a precise model checking solution for a compositional specifi- 
cation through automata construction, but much more importantly and practically, 
we develop an efficient model checking algorithm for directly verifying the com- 
positional specification. At the end, we show the result of our approach in the 
verification of a micro-instruction scheduler in a state-of-the-art microprocessor. 



1 Introduction 

GSTE is a symbolic model checking solution that combines the high capacity and ease 
of use of STE with the expressive power (Q-regular languages) of classic symbolic 
model checking [7,22,23]. It has been successfully applied to the formal verification 
of complex Intel designs with tens of thousands of state elements [22, 5, 21]. 

The specification language in GSTE is called assertion graphs, an operational for- 
malism based on a special form of regular automata with assertion letters (antecedent 
and consequent pairs) as its input alphabet. Each word in the language of an assertion 
graph provides both a sequential stimuli for simulating the system and the expected 
sequential responses. The sequential nature of an assertion graph, however, hinders its 
ability to succinctly describe the concurrent behavior of a circuit. 

In this paper, we present a compositional approach based on GSTE to overcome 
the limitation. Eirst, we propose a specification language that allows the concurrent 
behavior of a system to be specified succinctly in a compositional manner. Such a com- 
position is logical and does not rely on a deep understanding of the implementation 
details of the system. The language is an extension of the GSTE specification language 
with a new meet operator and is expressed in the form of Process Algebra [13, 19, 10]. 
Second, although we show that the compositional specification can be precisely model 
checked, we develop a much more efficient and practical solution to directly verify the 
compositional specification. The solution extends the GSTE model checking algorithm 
[22] with the ability to walk through the syntactical structure of the specification and 
establish a simulation relation from the language elements of the specification to the 
sets of states in the circuit. This avoids the exponentially expensive global assertion 
graph construction. 

R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 216-228, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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There have been extensive studies on concurrent system specification and verifica- 
tion, most notably along the lines of hierarchical state machines (e.g. Statecharts) [9, 
14, 4, 3, 2] and Process Algebra [19, 13, 10, 6]. However, most of these approaches have 
mainly focused on specification formalisms, correctness of specifications, and modular 
refinement strategies. None has provided an efficient and practical model checking so- 
lution to verify a concurrent specification against an implementation. Relying on model 
checking the global transition system for the specification is prohibitively expensive. 

In recent years, the assume-guarantee based compositional approach has been gain- 
ing popularity [20, 19, 15, 8, 16-18, 11, 12], driven by the need to deal with the capacity 
limitation of symbolic model checking. In this framework, a circuit under verification is 
described as a parallel composition of finite state components, and the correctness of the 
circuit is described as a collection of local properties, each of which specifies the cor- 
rectness of one component assuming that the correctness of the interfacing components. 
This approach achieves verification efficiency by model checking each local property 
against its component separately, and then establishing the global correctness using 
an inductive assume-guarantee reasoning. A main drawback of the approach, however, 
is that the specification is heavily implementation-dependent, requiring the deep un- 
derstanding of how these components interact with each other. Therefore, it is rather 
manual, labor-intensive and sensitive to the changes in the implementation. 

We firmly believe that a practical approach must contain two ingredients, a formal 
language to support succinct concurrent specifications, and an efficient solution (such 
as model checking) to verify such a specification against a complex Implementation. 
Our GSTE based approach addresses both. The rest of the paper is organized as fol- 
lows. In Section 2, we define assertion languages for GSTE and their trace semantics. 
In Section 3, we introduce the new binary meet operator □ for assertion languages. In 
Section 4, we present the compositional specification language for GSTE in a form of 
algebraic equations [13, 19, 10], and show that this language is well defined. We also 
prove a regularity result for a compositional specification, i.e., the limit of a repeated 
application of □ to any assertion language in the specification is regular. Since the limit 
is trace equivalent to the original language, this gives us a way to precisely model check 
the compositional specification using the GSTE model checking algorithm in [22, 23]. 
However, the construction of the regular automaton for the limit may cause an expo- 
nential blow-up in the size of the specification. Therefore in Section 5, we develop a 
GSTE model checking algorithm for directly verifying a compositional specification. 
In Section 6, we briefly discuss the result of the compositional GSTE approach in the 
verification of a micro-instruction scheduler from an Intel microprocessor design. 

2 Assertion Languages 

For the entire scope of the paper, we assume a finite, non-empty alphabet 2) called the 
domain. An co-trace (or simply trace), denoted by T = d\d 2 . . ., is any co-word in 2)“. 
For a circuit under verification, the domain T> is the set of all states in the circuit, and a 
trace is simply any infinite sequence of states. 

We define the assertion alphabet E over domain T) as the tuple E = P(lD) x P(lD) 
where P(lD) denotes the power set of 27. We call any letter in E an assertion letter, any 
word in E* an assertion word, and any language in P(E*) an assertion language. 
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Given an assertion letter a = (A, ®), A and '3 are called the antecedent and the con- 
sequent of the letter, denoted by ant{a) = A and cons{c) = The antecedent and con- 
sequent functions are point-wise generalized to an assertion word w in 2*, i.e., ant{yv) 
and cons{w) as defined by 

VI < i < |w|,ant(w) [i] = ant{w[i\),cons{w)[{\ = cons{w[i\). 

Assertion graphs dehned in [22, 23] are basically regular automata (or co-regular 
automata if an additional fairness condition is specified) for generating regular assertion 
languages. We shall use the following simple voting machine example throughout the 
paper to illustrate various concepts in the paper. 

Example 1. (Voting Machine) A voting machine (VM) consists of three voting stations 
(Figure 1). Each voting station receives a single vote through vin[i] (1 < i < 3). Once 
all stations have received their votes, the VM conducts a poll among the three stations 
and produces a final voting result vout. It then clears all voting stations to accept a new 
round of votes. Output signal av[i] tells if the i-th voting station is available. Signal reset 
resets the VM to its initial state. For simplicity, we ignore the content of a vote and that 
of the final result, and just care about control/status signals. 



av[l] 

vin[l] 

av[2] 

vin[2] 

av[3] 

vin[3] 



reset 




Fig. 1. Voting Machine (VM) 



The domain D for the VM is the set of values for the external signals. For instance, 

[reset = 0, v;n[l] = l,av[l] = \,vin\2] = 0,av[2] = 0,vin[3] = 0,av[l] = l,vout = 0] 

is a value in D. We use state predicates over D to represent sets of values in D, for 
instance, vin[\] V ^v/n[2] represents all values in D where either vin\\] = 1 or vin[2] = 0. 
The assertion word for the VM 

{{re set, true), (vi«[l] A ^vin[2] A ^vin[3],true), {vin[2] A vin[3],true), {true, vout)) 

says that after reset, if vote v;m[1] comes in followed by votes vin[2] and vin[3], then the 
VM produces the final result in the next time. □ 

Note that the specification in Figure 1 does not address how the system is imple- 
mented. As we shall show later in the paper, our compositional approach allows one 
to state the correctness of the system succinctly without the worry of implementation 
details. It empowers the model checking algorithm to connect the specification with the 
implementation of the system. 
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An assertion language is in some sense an operational specification formalism, 
where each assertion word in the language provides both a sequential stimuli for simu- 
lating the system and the expected sequential responses. Many or even an infinite num- 
ber of words may be needed to cover all possible behaviors of the system. An assertion 
graph in [22, 23] is a way to use a labeled finite graph to capture theses words. However, 
it can still be quite a cumbersome way to describe a system with inherent concurrency. 
For instance, there are l-|-3x2!-|-3!=13 different orders for the three votes to arrive 
to the VM, each of which must be captured by a path in the assertion graph for the VM, 
although the specification does not care about in which particular order the votes arrive. 
The compositional extension in this paper addresses this issue. 

In the following, we define the semantics of an assertion language. We say a trace 
T over £) satisfies a word n over P(lD), denoted by x|=7t, iff VI < / < |jt|, x[i] C 7t[/]. 
We say T satisfies an assertion word w in E*, iff x\=ant{w) => x\=cons{w). The trace 
language of an assertion word w, denoted £2(w), is the set of all traces satisfying w, i.e., 

f2(vi;) = {x G I (1) 

The trace language of an assertion language L, denoted by Q.{L), is the intersection of 
the trace languages of the assertion words in L, i.e.. 



This is the same as the semantics for assertion graphs in [22,23]. Because of the 
V-semantics, the union U of two assertion languages becomes stronger and yields fewer 
traces, in contrast to the traditional wisdom. The V-semantics is the basis for efficient 
GSTE model checking. The following theorem shows that a language with fewer asser- 
tion words yields more traces. 

Theorem 1 . Li C L2 ^ D Q.{L2). 



3 The Meet Operator 



To facilitate compositional specifications, we introduce a meet operator □ : E x E ^ E 
that takes two assertion letters Oi , 02 G E and produces another assertion letter such that 

anf(oina 2 ) = ant{oi)r]ant{ 02 ), and coMi'(oino 2 ) = cons{Oi)r]cons{ 02 ). (3) 



The meet operator is applied point- wise to two words wi,W 2 G E* of the same length, 
denoted by wiriw 2 , such that 



winw2 = 



8 if WI = W2 = 8 

(vvjriwj) • (01(102) if vvi = w'[ -Oi and ^2 =w'„- O2, 



(4) 



where • is the language concatenation operator. It is not difficult to show that the opera- 
tor is associative, commutative and idempotent. Finally, the meet operator is generalized 
to two languages £[,£2 G P(E*): 



£in£2 = {wir\W2 I Wl G Li,W2 G £ 2 , |wi| = |W2|}. 



(5) 
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Example 2. Consider the VM specification in Example 1. Language (1 < / < 3) 



^[i] = {reset, true) ■ {^vin[i\,true)* ■ {vin[i\,true) ■ {^vin[i\,true)* 

describes the first vote at the /-th station. The meet T'l □ T' 2 n T'a succinctly describes all 
possible sequences of getting the first three votes without the explicit enumeration. □ 

In the following, we define a repeated application of □ over a language L, ri^L, as 



iL ifA:=0 

\{rf-'^L)nL\fk>Q. 



(6) 



We can show that although the resulting meet language may produce new words, it is 
trace equivalent to the original language. 



Lemma 1. For all k>0, 

1. rfL<zrf+'^L, 

2. Q.{vfL) = Q.{vf+'^L). 

Now consider the limit Um>o for the ascending chain {rf' L, r\^L, ...). 

Theorem 2. 

1. ( Containment) L C |Jm>0 C\’”L, 

2. (Trace Equivalence) Q(il) = £2(|Jm>o 

The proof directly follows Lemma 1 by a transitivity argument. This result is important 
in establishing a regularity result for a compositional specification in Section 4. 



4 Compositional Specification 

We define a compositional specification C(TI) as a set of algebraic equations over a set 
of assertion languages 

n= {LQ}C{Li,...,Lh-i}C{Lh,...,Li^i}Ll{Li,...,Ln^i}, 

in which each equation is of the form 

1. (Initialization) 

Lo = eU Lo -Oo, (7) 

where Oq = (®, 21), or 

2. (Prefix) 

Li = Lj-Oj, (8) 

for each I <i < h where 0 < j < n and Oj G Z, or 

3. (Summation) 

-T; = 2/q U . . . U , 

for each h <i < I where 1 < ij < h for 1 < j < A:, , or 



(9) 
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4. (Meet) 

Li = Li^n...ULi^,, ( 10 ) 

for each I < i <n where q < ij < I for 1 < y < A:,, 

where 8 denotes the singleton language with the empty word {e} and • denotes the 
concatenation of a letter to the end of each word in a language. 

This style of compositional definition is similar to Milner’s CCS {Calculus of Com- 
municating Systems) [19] with three differences: a special initialization equation, the 
meet operator in place of the parallel composition operator |, and (3) the trace semantics 
of assertion languages. Note also that this style is a generalization of assertion graphs 
in [22, 23]. In fact, without any meet composition, it corresponds to an assertion graph 
where (1) the initial language corresponds to the initial vertex with a self-loop, (2) a 
prefix language corresponds to an edge, and (3) a summation language corresponds to 
a vertex in the graph. 

Example 3. The specification of the VM in Figure 1 is captured by the following set of 
algebraic equations, shortened by use a mixture of language and regular expressions. 



1. (Ready) Station i (1 < / < 3) is in its Ready[i\ state after being reset or polled. 

Ready[i\ = ( {true, true)* ■ {reset, av[i]) UPoll ■ {reset V ^vin[i\,av[i \) ) 

• {resets ^vin[i\,av[i\)* . 

2. (Voting) Station / (1 < i < 3) is accepting a vote. 

Voting[i\ = Ready[i] ■ {^reset Avin[i],av[i]). 

3. (Vote) Station i (I < i < 3) has got a vote. 

Voted[i] = Voting[i\ U {Voted[i]r\Wait) ■ {^reset,^av[i\). 

4. (Wait) At least one voting station is in its Ready state. 

Wait = 

5. (Poll) Every station is in its Voted state and one station is accepting a vote. 

Poll = {r\^^iVoted[i]) n {Ll^^iVoting[i]). 

6. (Output) The VM outputs the polling result. 



Output = Wait ■ {true,^vout) U Poll ■ {true,vout). 



□ 



Note that this specification is not a conjunction of simple independent properties. 
Rather it is a network of semantically inter-dependent “communicating” properties. The 
following theorem shows that this set of equations is well defined in the algebraic sense. 

Theorem 3. The set of equations C(1T) has a unique language solution. 
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The proof is based on Tarski’s fix-point theorem and an induction on the length of the 
words in the languages in the equations. We omit the proof due to the page limitation. In 
the following, we show that the limit of to each language Li in the compositional 
specification is regular. This is significant based on Theorem 2, as it gives us a way 
to precisely model check the specification by constructing an assertion graph for the 
language and verifying the graph using the GSTE model checking algorithm in [22, 
23]. To make the argument simple, we break each U and □ composition into a series of 
pairwise compositions by introducing intermediate languages. 

Lemma 2. 

I- Um>on'"(i:-o) = (Um>on'”i:)-o. 

2. Um>on'”(i:ini:2) = (Um^on^iiOnaj^^on^iiz). 

5. U™>o = (U„>o n^iii ) U (U^>o VTL 2 ) u (U™>o n^iii ) 

n(Um>on'”i:2). 

The proof of the lemma is done by distributing □ over • and U, and then using the 
language containment argument. We omit the proof due to the page limitation. 

Theorem 4. Um>o is regular for every language L[ (0 < i < n) in Tl. 

Proof. Consider the power set of the limit languages P(Um>on'”i^i | Li G Tl). Based 
on Lemma 2, each language in the power set can be expanded algebraically into either 
a prefix composition of another language, or a summation composition of some other 
languages in the set. Without the meet, the construction of a regular automaton for each 
language in the set is straight forward. Since Um>o n™/-, is in the set, the claim holds. 

□ 

5 Direct Model Checking of Compositional Specification 

Although model checking any language in the compositional specification can be done 
through the construction of the regular automaton for the meet limit of the language, 
such a construction is likely to cause an exponential blow-up in the specification size. 
To avoid the problem, we develop a GSTE model checking algorithm that directly walks 
through the syntactical structure of the specification and establishes a simulation rela- 
tion from the language elements in the specification to the sets of states in the circuit. 
We first define a model over ® as a triple M = (5, ^,L) where 

1 . 5 is a finite set of states, 

2. Hies X 5 is a state transition relation over S such that for every state s G S, there 
is a state s' GS satisfying (^,^') G Hf, 

3. L : 5 — > 2) is a labeling function that labels each state s G S with a value d G “D. 

M induces a monotonic post-image transformation function post^^ : P(5) ^ P(>5): 

postf^{S') = {s' I 3s G y,(s,s') G HQ, (11) 

for all S' G P(5). We drop the subscript M when it is understood. A model is a Kripke 
structure without the initial state. To follow the STE convention, we assume that every 
state in M is an initial state. We extend L to state sets and define its inverse function 
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L{S’) = {L(j) I s G 5'}, and L {D') = {^ G 5 1 L{s) G £>'} (12) 

for all S' G P(5) and D' G P(D). 

A run of the model is any mapping function y: N — > 5 such that for all i > 0, 
(y(/),y(i+ 1)) G The trace generated by y, denoted by i(y), is the point-wise ap- 
plication of L over y, i.e., L{y){i) = L{y{i)) for all i> 0. The trace language of the 
model, denoted by Q.{M), is the set of all traces generated by the runs of the model, 
i.e., Q(M) = {T(y) I y is a run of M}. 

We say that the model M satisfies an assertion language L, denoted by M\=L, if 

Q.{M)QQ.{L). (13) 

M satisfies the set of equations C(n), denoted by M\=CiJl), if M\=Li for all Li G 11. 

The key idea of the model checking algorithm is to compute a simulation relation 
for each language in 11 on M. A simulation relation is a mapping from each language 
Li (0 <i < n) to a state set T,- G P(5), such that for every state s G S, s G “Ti if there is 
a word w G Li and a run y in M such that 

(1) L(y)\=ant{w), and (2) ^ = y(|w|). (14) 

The simulation relation captures, for each language, the end of any run in the model 
satisfying the antecedent sequence of a word in the language. It has nothing to do with 
consequents. The importance of the simulation relation is stated in the following lemma. 

Lemma 3. M\=C{JT), if for every prefix language Li = Lj ■ O/, T, C cons{<3f). 

Proof First we prove that M\=Li for every prefix language. Assume M f=Li. Then by 
(13) and (1), there is runy of M and a word w = w' ■ <3 j in A, such that (1) L{y)\=ant{w) , 
but (2) L(y(|w|) ^ cons{Oj). By (14), y(|w|) is in Li, and thus L{Li) % cons{Oj). Fur- 
ther, M\=Lq since Lq = (2),®)*. Based on this result, It also becomes obvious that 
M\=Li for every U-composition A,- (li<i< 1) by (2). Finally, consider a H-composition 
Li = A,j n . . . n A,j^ . Consider a trace T in Q(M) . Let w = vvi □ . . . r\Wkf be a word in A,-. 
By (13), we have f\=ant{wj) => x\=cons{w j) for all l<j<k i. Now let us assume 
x\=ant{w). Then by (4), x\=ant{wj) and therefore x\=cons{w j) for 1 < 7 < kt. Thus, 
%\=cons{w). Therefore, x G Ll{Li) by (5) and thus M|=A;. □ 

We now show how to iteratively compute a simulation relation for the specification 
based on its structure. Let Sn to denote the «-ary state set tuple (5o,5i, . . . ,5«-i). We 
define a partial order relation SnL S'^, iff VO < i < n,Si C 5;. (P(5) x . . . x P(5), A) 
forms a finite c.p.o. with the bottom element being 0 „ = (©,.. . , 0 ). 

We define an update function for n-ary state set tuples on model M 



Y{Sn) = {Yo{Sn \ Fi (5„), . . . , Yn-l{S„)) 



(15) 



where 

1. (Initialization) Lq = eU Lq ■ Oq: 



YoiSn) = s. 



(16) 




224 



Jin Yang and Carl-Johan H. Seger 



2. (Prefix) Li = Lj ■ Cj for I <i < h: 



( L {ant{Oj)) if Lj = Lq 

\ post{Sj)nL^{ant{cj)) otherwise, 



( 17 ) 



3. (Summation) Li = iL,j U . . . U for h<i<l\ 

Yi{Sn)=\J%iSi^, (18) 

4. (Meet) Li = Li^ □ . . • for I <i <n: 

Yi{Sn)=n)L^Si^. (19) 

It can be shown that Y is monotonic, and the sequence (F*'(0„), Y^(0„), Y^(0„), . . .) is 
an ascending chain with a least fixpoint, i.e., 3M > 0,\/k > M,Y^(Jdn) = Y^{%n)- 

Lemma 4. Y^{%n) is a simulation relation for C(n) on M. 

The proof is based on an induction on the length of words leading to a state in the 
simulation relation, and is omitted due to the page limitation. Based on this result, 
we develop a GSTE model checking algorithm for the compositional specification in 
Figure 2. The following correctness result holds. 



Algorithm: cGSTE{C{H),post) 



1. To := 5, T; := 0 for all 1 < i< n; 



2. active := {T; | T; = Lq ■ Oj}; 

3. while active f 0 

4. Li \= pickOne(flcf;ve); 

5 . case Li = Lq- a j: 

6. Li = Lj-Oj: 

V. Ti = A'jU...UA,_^: 

8. else: 

9. endcase 

10 . itnewf'Ti 

1 1 . add to active every Tj. 

12. T; := new, 

13. end while 



new L {ant(Oj))\ 

new ’.= postifL j) (ant{<3 j))\ 



I 

new := U 
new := 




having T; in its definition; 



14. if T; cons{aj) for some T; = Lj ■ Oj 

15. return(/a/ie); 

16. retumffrae); 

end. 



Fig. 2. cGSTE 



Theorems. M|=C(n), ifcGSTE{C{Yl),M) returns true. 

The algorithm initially sets the simulation relation to the empty set for every lan- 
guage except for Lq, which is set to S. It then iteratively updates the simulation relation 
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for a language by locally propagating the simulation relations for the languages in its 
definition. It does so until no change can be made to any simulation relation. By avoid- 
ing the construction of a “glohal” automaton which may cause an exponential blow-up 
in the specification size, it becomes conservative but gains great efficiency. 

To show the advantage of our approach over the assume-guarantee based compo- 
sitional approach, we continue on the VM in Example 1 . Figure 3 shows two different 
implementations of the VM. Implementation (1) is partitioned into four modules, one 
for each station and one for the final polling. Each station i tracks its own voting status 
in v\{\. Implementation (2) bundles all the signals from and to the three stations into 3- 
bit vectors, and the vector av[l: 3] tracks the availability status of each station. Assume 
that clr (set) sets the register to 0 (1) for the current and next steps. 



av(lj 

vinll] 

av[2] 

vin(2J 

av[.1] 

vinl3] 



rcfict 




(I) 



reset 




ih 



Fig. 3. Two VM Implementations 

The assume-guarantee based approach is heavily implementation dependent and 
requires a clear understanding of the implementation details of the VM. For instance, 
given implementation (1), the overall specification may be decomposed into four local 
properties for the four modules, linking together through the interface behaviors of the 
voting status signals v[l], v[2] and v[3]. The property for the polling module may say, 
among other things, that if v[i] is high for all 1 < i < 3, then out = 1. The correctness of 
the decomposition also needs to be justified. Further, the change of the implementation 
could require an entirely different decomposition. For instance, the decomposition for 
implementation (2) would be bit-slicing based and relies on behaviors of the availability 
signals av[3: 1] to glue local properties together. It is conceivable that such a manual 
approach will be labor-intensive and difficult to get it right for complex designs. 

On the contrary, our compositional specification is implementation independent. 
The languages in the specification can be viewed as a logical decomposition of the 
specification with no mentioning of the internal signal behaviors. Our model checking 
algorithm automatically computes the mapping from the end behavior specified by each 
language to the set of corresponding circuit states for any given implementation. Table 1 
summarises the final simulation relations for Ready[i], Voting[i\ and Voted[i] computed 
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by the algorithm on the two implementations. Based on this, the simulation relation for 
Poll on implementation (1) is -^clrA (A?^j {vin[i] V v[i])) A (V?^[ {vin[i] A ^v[i])), which 
allows one to conclude that the implmentation will indeed satisfy out = 1 at the next 
step. Finally, we point out that the quaternary simulation aspect of GSTE allows our 
algorithm to focus only the relevant part of the model at each iteration step and store 
the simulation relations efficiently in an abstract form. We will not talk about it in the 
paper due to the page limitation. 

Table 1. Final Simulation Relations for the VM 



Language 


1 Implementation (f) | 


Implementation (2) 


Ready[i] 


r 

< 

> 

r 

> 


{set V -^vin[i ] ) A av[i] 


Voting[i] 


-^dr Avin[i] A-iv[i] 


-^set A vin[i] Aav[i] 


Voted[i] 


-ic/rA {vin[i\ Vv[;]) 


-^set A {vin[i\ V ->av[{\) 



6 Verification of Micro-instruction Scheduler 

The compositional GSTE has been implemented as a part of the GSTE verification sys- 
tem inside the Intel Forte environment ([1]). In this section, we discuss the verification 
of a micro-instruction (uop) scheduler in the original Intel0Pentium0 4 Microproces- 
sor Scheduler/Scoreboard unit (SSU) (see Figure 4) as described in [5, 21]. The sched- 
uler can hold and schedule up to 10 micro-instructions. The correctness property for the 
scheduler is that “when the resource is available, the oldest ready uop in the scheduler 
will be sent out.” Even a much weaker version of the property, on the priority matrix 
module only stating “a uop would be scheduled if there are ready uops”, had been quite 
difficult to prove previously using a state-of-the-art in-house symbolic model checker 
based on an assume-guarantee based approach. The logic involved created significant 
tool capacity problems, and required that the high level property be decomposed into 
literally hundreds of small local properties. Creating and proving this decomposition 
was done manually, and required a significant amount of time. Its maintenance and 
regression has been costly as the design changed. 

Using the compositional GSTE approach, we were able to specify the entire cor- 
rectness property succinctly in a compositional manner at the unit level, and verify it 
very efficiently using the compositional model checking algorithm. The compositional 
specification was developed in a top-down fashion. The top level property is 

Prop = OldestReadyU op[{\ • {~^stop,sched[t\) 

which simply says that if uop[i\ (0 < i < 10) is the oldest ready uop, then schedule 
it at the next step if the resource is available. We then expand the property 
OldestReadyU op[i \ : 

OldestReadyU op[{\ = Ready[t\U{n otReady[j] U EnqueuedEarlier[i, j])) 

which simply defines the oldest ready uop as the uop that is ready and was enqueued to 
the scheduler earlier than any other ready uop. We keep expanding the properties until 
they are described in terms of input/output signal behaviors. 
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ready 




stop 



Fig. 4. Intel® Pentium® 4 Microprocessor SSU Scheduler 



A recent rerun of the verification was done on a computer with 1 . 5 GHz Intel® 
Pentium® 4 processor with 1 GB memory. It was done on the original SSU circuit 
with no prior abstraction, which contains 718 latches, 361 inputs, and 17367 gates. The 
verification was done in 122 seconds using only 36MB memory. Most of the memory 
was used for storing the circuit. The tremendous verification efficiency benefits from 
the specification driven state space exploration strategy of the GSTE algorithm and the 
extended quaternary circuit abstraction technique [22]. Because of these, the verifica- 
tion complexity largely depends on the complexity of the specification to be verified 
rather than that of the circuit under verification, and is very scalable when the number 
of uops handled by the scheduler increases. 

Finally, we would like to point out that compositional GSTE has been in active use 
in Intel and been successfully applied to the verification of several large-scale complex 
circuits in microprocessor designs. 



7 Conclusion 

In this paper, we presented a practical compositional extension to GSTE. For future 
work, we would like to extend the terminal condition and the fairness condition [23] to 
compositional GSTE, and apply the abstraction/refinement methodology in [22] to the 
compositional framework. 
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Abstract. Verifying whether an co-regular property is satisfied by a finite-state 
system is a core problem in model checking. Standard techniques build an au- 
tomaton with the complementary language, compute its product with the system, 
and then check for emptiness. Generalized symbolic trajectory evaluation (GSTE) 
has been recently proposed as an alternative approach, extending the computa- 
tionally efficient symbolic trajectory evaluation (STE) to general co-regular prop- 
erties. In this paper, we show that the GSTE algorithms are essentially a parti- 
tioned version of standard symbolic model-checking (SMC) algorithms, where 
the partitioning is driven by the property under verification. We export this tech- 
nique of property-driven partitioning to SMC and show that it typically does 
speed up SMC algorithms. 



1 Introduction 

Verifying whether an co-regular property is satisfied by a finite-state system is a core 
problem in Model Checking (MC) [23,31]. Standard MC techniques build a comple- 
mentary Buchi automaton (BA), whose language contains all violations of the desired 
property. They then compute the product of this automaton with the system, and then 
check for emptiness [30,23]. To check emptiness, one has to compute the set of fair 
states, i.e., those states of the product automaton that are extensible to a fair path. This 
computation can be performed in linear time by using a depth-first search [10]. The 
main obstacle to this procedure is state-space explosion, i.e., the product is usually too 
big to be handled. Symbolic model checking (SMC) [3] tackles this problem by repre- 
senting the product automaton symbolically, usually by means of BDDs. Most symbolic 
model checkers compute the fair states hy means of some variant of the doubly-nested- 
fixpoint Emerson-Lei algorithm (EL) [13]. 

Another approach to formal verification is that of Symbolic Trajectory Evalua- 
tion (STE) [28], in which one tries to show that the system satisfies the desired prop- 
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MIUR COFIN02 project, code 2002097822.003, and by a grant from the Intel Corporation. 
** Supported in part by NSF grants CCR-9988322, CCR-0124077, CCR-03 11326, IIS-9908435, 
IIS-9978135, EIA-0086264, and ANI-0216467 by BSE grant 9800096, and by a grant from 
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erty by using symbolic simulation and quaternary symbolic abstraction. This often en- 
ables quick response time, but is restricted to very simple properties, constructed from 
Boolean implication assertions by means of conjunction and the temporal next-time 
operator [5]. Recently, GSTE [33] has been proposed as an extension of STE that can 
handle all co-regular properties. In this framework, properties are specified by means 
of Assertion Graphs (AG). The GSTE algorithm augments symbolic simulation with a 
fixpoint iteration. Recent work on GSTE, e.g., in [34], has described various case stud- 
ies and has focused mainly on abstraction in GSTE. The fundamental relation between 
GSTE and SMC, however, has not been completely clarified. The basic relationship be- 
tween AGs and BAs is sketched in [21], but the algorithmic relationship between GSTE 
and SMC has not been studied. 

In this work, we analyze the property-specification language and the checking al- 
gorithm used by GSTE and compare them to those used in SMC. (We do not deal with 
abstraction, which is an orthogonal issue.) We first fill in details not given in [21] to 
show that assertion graphs are essentially universal co-automata [24], which require all 
runs to be accepting. Universal automata enjoy the advantage of easy complementa- 
tion; in fact, they can be viewed as nondeterministic automata for the complementary 
property. Eormally, given a BA, one can easily construct an AG for the complemen- 
tary language, and vice versa. This permits us to do a direct comparison between the 
algorithms underlying GSTE and SMC. 

We then point out that the GSTE algorithms are essentially a partitioned version of 
the standard SMC algorithms. SMC algorithms operate on subsets of the product state 
space SxV, where S is the state space of the system and V is the state space of comple- 
mentary automaton. We show that GSTE operates on partitioned subsets of the product 
state space. The partitioning is driven by the automaton state space. The GSTE analog 
of a subset QQ SxV is the partition {Qv : v G V}, where Qv = {s '■ (i,v) G Q\. The 
GSTE algorithms are in essence an adaptation of the standard SMC algorithms to the 
partitioned state space. Thus, rather than operate on a BDD representing a subset P of 
the product state space, GSTE operates on an array of BDDs, representing a partitioning 
of P. We refer to such partitioning as property-driven partitioning. 

Einally, we proceed to explore the benefits of property-driven partitioning in the 
framework of SMC. We use NuSMV [6] as our experimental platform in the context of 
LTL model checking. We added to NuSMV the capability of property-driven partitioned 
SMC, both for safety LTL properties and for full LTL properties, and compared the 
performance of SMC with partitioned SMC. We find that property-driven partitioning 
is an effective technique for SMC, as partitioned SMC is typically faster than SMC. The 
major factor seems to be the reduction in the number of BDD variables, which results 
in smaller BDDs. The reduced BDD size more than compensates for the additional 
algorithmic overhead for handling a partitioned state space. 

Partitioning techniques have often been proposed in order to tackle the state space 
explosion problem. (We refer here to disjunctive partitioning, rather than to the orthog- 
onal technique of conjunctive partitioning, which is used to represent large transition 
relations.) Static partitioning techniques, which require an analysis of the state space, 
have been discussed, e.g., in [25]. Dynamic partitioning techniques, which are driven 
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by heuristics to reduce BDD size, have been discussed, e.g., in [4], Partitioning has been 
used in [19] to develop a distributed approach to SMC. 

Property-driven partitioning is orthogonal to previous partitioning techniques. Un- 
like dynamic partitioning techniques, no expensive BDD-splitting heuristics are re- 
quired. Unlike previous static partitioning techniques, property-driven partitioning is 
fully automated and no analysis of the system state space is needed. The technique is 
also of interest because it represents a novel approach to automata-theoretic verifica- 
tion. So far, automata-theoretic verification means that either both system and property 
automaton state spaces are represent explicitly (e.g. in SPIN [20]) or symbolically (in 
NuSMV [6] or in Cadence SMV www-cad.eecs.berkeley.edu/~kenmcmil/smv/). 
Just like GSTE, property-driven partitioning enables a hybrid approach, in which the 
property automaton, whose state space is often quite manageable, is represented explic- 
itly, while the system, whose state space is typically exceedingly large is represented 
symbolically. Other hybrid approaches have been described in [1, 7, 18], but ours is the 
first work to evaluate a hybrid approach in the context of general model checking. 

The paper begins with an overview of the basic notions of SMC [9] and GSTE 
[33] in Section 2: first, BAs and AGs are defined in a new perspective that clarifies 
the common underlying structure; we then describe SMC and GSTE model checking 
procedures. In Section 3, first, we prove that AGs and BAs are equivalent; then, we 
analyze the checking algorithms of GSTE and show that it is a partitioned version of 
standard SMC algorithms. In Section 4, we export property-driven partitioning to SMC 
and we report on the comparison of SMC with partitioned SMC in the framework of 
NuSMV. We conclude in Section 5 with a discussion of future research directions. 



2 BUchi Automata and Assertion Graphs 

In this section, we introduce the specification languages and the checking algorithms 
used by SMC [9] and GSTE [33]. In SMC, we can specify properties by means of 
BAs, while GSTE uses AGs. Both the languages have a finite and a fair semantics. The 
finite semantics is checked with a fixpoint computation, while the fair one requires a 
doubly-nested fixpoint computation. 

We define a system M as a tuple {S,Si,T), where S is the set of states, Sj C S is 
the set of initial states, T C S x S is the transition relation. We use capital letters such 
as Y,Z,.. to denote subsets of S. We define functions post,pre : 2^ — > 2^ such that 
post{Y) = {V G 5 I {s,s') G T,s G Y} and pre{Y) = {V G 5 | {s',s) G T,s G Y}. A 
finite (resp., infinite) trace in M is a finite (resp., infinite) sequence a of states such that 
o[/-f 1] G post{o[i]) for all 1 < i < |o| (resp., i > 1). A trace o is initial iff o(l) G Si. 
We define Lf{M) as the set of all initial finite traces of M and L{M) as the set of all 
initial infinite traces. 

In the following, we propose a new representation for BAs and AGs: both can be 
seen as an extension of Eair Graphs (EG). This is the structure which AGs and BAs have 
in common. As we shall see, while an AG is an EG with two labeling functions, a BA 
is an EG with just one labeling function. We use labels on vertices rather than on edges 
(as in GSTE [33]). This does not affect the generality of our framework and allows for 
an easier comparison between GSTE and SMC as well as an experimental evaluation 
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in the framework of NuSMV. Moreover, labels are defined as sets of systems states. (In 
practice, labels are given as predicates on system states; a predicate describes the sets 
of states that satisfy it.) 

2.1 Fair Graphs, Biichi Automata and Assertion Graphs 

Fair Graphs are essentially graphs with the addition of a fairness condition. 

Definition 1. A Fair Graph G is a tuple (V,Vi,E,!f) where V is the set of vertices, 
Vi fV is the set of initial vertices, E <ZV xV is a total relation representing the set of 
edges, and = {Fi, ...,F„}, with Ej C V for 1 < j <n, is the set of fair sets. 

A finite (resp., infinite) path in G is a finite (resp., infinite) sequence p of vertices 
such that (p[/],p[i+ 1]) G E for all 1 < i < |p| (resp., i > 1). p is initial iff p[l] G Vj. p is 
fair iff it visits every set E G 9^ infinitely often. We define Lf{G) as the set of all finite 
initial paths of G and L{G) as the set of all fair initial paths. 

For every v G F we define the set of successor vertices F(v) = { v' G F | (v, v') G F} 
and the set of predecessor vertices E^{v) = {v' G F | (v',v) G E} . (The operators E 
and E^ are analogous to post and pre. They are used for clarity of notation.) 

A labeling function is a function y : V — > 2^^. Given a set of vertices V' C F, we 
define the restriction of y to F' as follows: Y|^, (v) = y(v) if v G F', and y\^, (v) = 0 
otherwise. Typically, we use a, (3, y to denote labeling functions. Notice that a labeling 
function y can be considered and represented as a set of subsets of S: {y(v)}vgv- With 
abuse of notation, given two labeling functions a and y, we will write a C y(resp., any, 
aUy) to mean, for all v G F, a(v) C y(v) (resp., a(v) ny(v), a(v) Uy(v)). 

Definition 2. Given a trace O in M, a path p in G of the same length I (resp., both 
infinite) and a function y : V — > 2^, we say that o satisfies p under y (denoted G |=y p ) 
iffG[i] G y{p[i\) for all I <i< I (resp., i> 1). 

A Biichi automaton (BA) is essentially an FG with the addition of a labeling func- 
tion. A trace is accepted by a BA iff it satisfies the labeling function along at least one 
path of the FG. In the following, BAs express complementary properties, that is, their 
language contains all violations of the desired property. 

Formally, a Biichi Automaton B is a tuple (G, L) where G = {V,Vi,E, (f) is a fair 
graph, and L : V — > 2^^ is the labeling function. We define the set Lf{B) (resp., L{B)) 
as the set of finite (resp., infinite) traces of M accepted by B: 

Definition 3. 

- finite semantics: if (f = {F}, Lf{B) = {o G Lf{M) \ there exists a finite path p G 
Lf{G) with |o| = IpI = Z, p[/] G F andG p}; 

- fair semantics: L{B) = {o G F(M) | there exists a fair path p G F(G) with a \=^£ p}. 

Since BAs have the complementary language of the specification, the model checking 
problem consists in verifying whether Lf{B) — %,'m the case of finite semantics, L{B) = 
0, in the case of fair semantics. 

An assertion graph (AG) is essentially an FG with the addition of two labeling 
functions: the antecedent and the consequent. An AG accepts a trace iff, along all paths, 
either the trace does not satisfy the antecedent or if it satisfies the consequent. 
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Formally, an Assertion Graph A is a tuple {G, ant, cons) where G = {V,Vi,E,‘f) 
is a fair graph, ant : V — > 2^ is the antecedent function, and cons : V — > 2^ is the 
consequent function. Given a trace o in M and a path p in G of the same length, we say 
that o satisfies p in A (denoted o \=a p) iff O \=ant p O \=cons P- We define the set 
Lf{A) (resp., L{A)) as the set of finite (resp., infinite) traces of M accepted by A: 

Definition 4. 

- finite semantics: if = {F}, Lf{A) = {o C Lf{M) \ for all finite path p G Lf{G), 
if\<3\ = IpI = / andp[l\ G F, then a p},‘ 

- fair semantics: L(A) = {o G L{M) \ for all fair path p G L{G), O \=a p}- 

The model checking problem for AGs consists in verifying whether L/(M) C L/(A), in 
the case of finite semantics, L{M) C L{A), in the case of fair semantics. 

Example 1. An example of FG is depicted in Fig. 1. The vertices are represented by 
points, the edges by arrows. An arrow without starting vertex point to a vertex to indi- 
cate that it is initial. For simplicity, in the example we have only one fair set. The circle 
around the rightmost vertex means that it belongs to this fair set. 

Examples of BA and AG are depicted resp. in Fig. 2 and 3. They have the same 
underlying FG. In the AG, the labels are represented in the format ant jcons. p and q 
are propositional properties. With the fair semantics, the AG corresponds to the LTL 
property G{p — > Fq), while the BA has the complementary language. □ 

2.2 SMC Algorithms 

Given a system M = (S,S[,T) and a BA B = ((V,Vj,E,fi),L), SMC first computes 
the product P between B and M. Then, in the case of finite semantics, it finds the set 
of vertices reachable from the initial vertices and check if it intersects a certain set of 
vertices Fp in P; in the case of fair semantics it finds the set of fair vertices, i.e., those 
which are extensible to fair paths, and it checks if it intersects the set of initial vertices. 

The product between M and B is a BA defined as follows: P := {{Vp,Ip,Ep, fip) , Lp) 
where Vp = {(i,v) | ^ G M, v G G L{v)}, /p = {( 5 , v) G Vp | j G Si,v G V/}, Fp = 
{((■s,v),('S',v')) I (■s,v) G Vp,{s',v')& Vp,{s,s') & T,{v,v') & E}, <fip = {Fpx,...,Fpn} 
where Fpj = {(s, v) G Vp | v G F,}, Lp{s,v) = { 5 }. 

In the case of finite semantics = {F}, so that fip = {Fp}, where Fp = {(i,v) G 
Fp I V G F|. Then, it is easy to see that Ff{P) = Ff{B). Moreover, every finite path of 
P corresponds to a finite trace of M accepted by B. Thus, to verify that Lf{P) = 0, we 
can just compute the set of reachable vertices and check that it does not intersect Fp. 
Usually, this set is found with a traversal algorithm like the one described in Fig. 4. 
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Algorithm traversal{P) 

1. R\=Ip 

2. N-.= lp 

3. repeat 

4. Z := EY[A] 

5. N:=Z\R 

6. R:=RUZ 

7. until A = 0 

8. return R 

Fig. 4. 



Algorithm fairstates(P) 

1. Y:=T 

2. repeat 

3. Y' :=Y 

4. for Fp e !Fp 

5. Z:=E[yU(yA/»] 

6. y:=rAEX[Z] 

7. until K' = r 

8. return K 

Fig. 5. 



Similarly, in the case of fair semantics, it is easy to see that L{P) = L{B). Moreover, 
every fair path of P corresponds to an infinite trace of M accepted by B. Thus, to verify 
that L{P) = 0 we can just compute the set of fair vertices and check that it does not 
intersect Ip. The standard algorithm to compute the set of fair vertices is the Emerson- 
Lei algorithm (EL) described in Fig. 5 [13]. SMC tools typically implement a variant 
of this donbly-nested fixpoint computation. 



2.3 GSTE Algorithms 

The algorithm used by GSTE to check the AG in the different semantics is described in 
Fig. 6. The function GSTE_fairstates of line 2 is called only in the case of fair seman- 
tics and it is described in Fig. 7. GSTE_fairstates restricts the antecedent function to 
the states of the system that are extensible to fair paths. In the lines 3-9 of Fig. 6, a is 
defined iteratively until a fixpoint is reached. First, a is initialized to be the restriction 
of ant to the set of initial vertices and to the set of initial states. Then, at every iteration, 
a state s is added to a(v) iff s G ant{v) and there exists a state s' G a(v') snch that s is 
reachable from s' in one step and v is reachable from v' in one step. When the fixpoint 
is reached, a(v) contains s iff there exists an initial path p of the assertion graph and an 
initial trace o of the system of the same length I such that p [/] = v,o[l] =s and a \=ant P- 
With an analogous fixpoint compntation (lines 6-10), GSTE_fairstates finds afnnc- 
tion a such that a(v) contains s iff there exist a path p of the assertion graph and a trace 
a of the system of the same length / snch that p[/] G E, p[l] = v, o[l] = s and a \=ant P- 
This compntation is applied for every E G If and it is nested in a second fixpoint com- 
putation: at every iteration the antecedent function is npdated with a until a fixpoint is 
reached. At the end of the outer loop, ant{v) contains s iff there exist a fair path p of the 
assertion graph and an infinite trace a of the system such that a \=ant P- 



3 GSTE vs. SMC 

In this section, we clarify the relationship between GSTE and SMC. First, we show that 
AGs and BAs are eqnivalent. Then, we show GSTE algorithm is essentially a “parti- 
tioned” version of the SMC algorithm. 

We now show that, given a BA B, one can easily find an AG A with the complemen- 
tary language and vice versa. This means that, given a specification tp, one can choose 
either GSTE or SMC techniqnes to check tp, no matters whether tp is an AG or a BA. 
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Algorithm GSTE(M,A) 

1 . if fair semantics 

2. then A := GSTE^fairslates(M, A) 

3. 

4. for V e V a(v) := a(v) n 5/ 

5. repeat 

6. a' := a 

7. for V e P a(v) := 

8. a'(v)uUv'6£-( y) post {a' ( v' ) ) n ant ( v) 

9. until a' = a 

10. if fair semantics 

1 1 . then return a C cons 

12. else retnrn a|^ c COM5 



Fig. 6. 



Algorithm GSTEJairstates{M,A) 

1. repeat 

2 . ant' := ant 

3. for F G iF 

4. for V e V a(v) := 

5 - [Jv'(^E(v),v'(tF pre{ant(v'))r\ant{v) 

6. repeat 

7. a' := a 

8. for V e V a(v) := 

9. a'(v) U Uv'e£(v)P''«(«'(v')) l~l 
antiy) 

10. until a! = a 

11 . ant := a 

12. until ant' = ant 

13. return A 

Fig. 7. 



Moreover, since BAs are nondeterministic (i.e., existential) automata, AGs are revealed 
to be their dual, which are universal automata. 

The following four theorems establish the relationship between AGs and BAs^ 
First, the following two theorems show how to express AGs as BAs. 

Theorem 1 . Let A = {G, ant, cons) be an AG where G= {V,Vi,E, f) and f = {F}. Let 
B be the BA {G',L), where G' = s.t. F' = F x {0, 1,2}, F/ = F/ x {0,1}, 

E' = |((vi,Fi), (v2,F2)) I (vi,V 2 ) G E, k2 G {0,1} if k\ = 0, and k2 = 2 otherwise}, 
!f' = {Ex {1,2}}, L{(y,k)) =ant{v) ifkG {0,2}, and L{{v,k)) = ant{v)n{S\cons{v)) 
ifk= 1. Then Lf lB) = Lf lM)\Lf{A) 

Theorem 2 . Let A = {G, ant, cons) be an AG where G = {V,Vi,E, f) and f = {Fi, ..., 
En). Let B be the BA {G',L), where G' = (F',F/,F', )F') s.t. F' = F x {0,1,2}, F/ = 
F/ X {0,1}, E' = {((vi,A:i),(v 2,A:2)) | (vi,V2) G E, k2 G {0,1} if ki = 0, and k2 = 2 
otherwise}, = {Fi x {2},...,F„ x {2}}, L{{v,k)) =ant{v) ifkG {0,2}, and L{{v,k)) 
= ant{v) n (5'\cons(v)) ifk = 1. Then L[B) = L(M)\L{A) 

The following two theorems show how to express BAs as AGs. 

Theorems. Let B = (G,L) be a BA where G = {V,Vi,E,f) and f = {F}. Let A 
be the AG {G, ant, cons), where ant = L, cons{v) = @ for all v GV. Then LfiB) = 
Lf{M)\Lf{A) 

Theorem 4 . Let B = (G,L) be a BA where G = (y,Vj,E,‘f). Let A be the 
AG {G, ant, cons), where ant = L, cons{v) = %for all v GV. Then L{B) = L{M)\L{A) 

We now compare the algorithms used by GSTE and SMC. In particular, we show 
that the former is essentially a “partitioned” version of the latter. 

^ A longer version of the paper, containing the proofs of the theorems and a more extensive bibli- 
ography , can be downloaded at www.science.unitn.it/~stonetta/partitioning.html. 
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In Section 2, we saw how SMC solves the model checking problem for a BA B\ it 
builds the product automaton P between M and B and it verifies that the language of 
P is empty. GSTE follows an analogous procedure for checking an AG A: it actually 
computes the product between M and Bant, where Bam is a BA with the same underlying 
graph G of A and the labeling function equal to ant. The only difference between SMC 
and GSTE is that the latter operates on partitioned subsets of the product state space. 
The partitioning is driven by the automaton state space and we refer to such partitioning 
as property-driven partitioning. The GSTE analog of a subset Q C Sp is the partition 
{Qv -v GV}, where Qv = {s : (j, v) G Sp}. Indeed, every labeling function yean be seen 
as a division of the model into sets of states, one for every vertex v of the graph, which 
is exactly the set y(v). If Y C ant, then y turns out to represent a set Sy C Sp of states in 
the product defined as follows: = {(5, v)|i G Y(v)} 

One can see that the lines 3-9 of the algorithm in Fig. 6 computes the reach- 
able states of Sp. In fact, we could rewrite lines 6-8 in terms of CTL formulas as 
a = aUEY[a]. Thus, at the end of the loop, a(v) = {^|(5, v) is reachable in Sp}. This 
computation is actually a partitioned version of the one of Fig. 4 with the difference 
that SMC applies the post-image only to the new states added in the previous iteration, 
while GSTE applies the post-image to the whole set of reached states. 

In the case of fair semantics the computation of reachable states is preceded by 
a pruning of the product: GSTE.fairstates finds all vertices of S'/* such that they are 
extensible to fair paths. To compare this procedure with EL, we rewrite the operations 
of GSTE_fairstates in terms of CTL formulas. At the lines 4-5 of the algorithm in Fig. 
7, GSTE_fairstates actually computes the preimage of ant\j, (seen as a set of states 
in Sp). So, we can rewrite these lines as a = EX[(anf|^)]. Furthermore, the lines 7- 

9 are the same as a = aU (ant n EX[(a)]) so that one can see the loop of lines 6- 

10 as a = E[(fl«t)U(a)]. This reachability computation is nested in a second fixpoint 
computation, so that it becomes evident that GSTE .fairstates is a variant of the EL 
algorithm of Fig. 5. 

4 SMC vs. Property-Driven Partitioned SMC 

In Section 3, we saw that GSTE is a partitioned version of SMC. We can also apply 
property-driven partitioning to standard SMC algorithms. In particular, there are two 
algorithms to be partitioned: traversal and fairstates (Fig. 4 and 5). We partitioned 
both of them, by using NuSMV as platform. This choice is motivated by the fact that 
NuSMV implements symbolic model checking for LTL, its source is open, and its code 
is well-documented and easy to modify. 

The “translated” algorithms are shown is Fig. 8 and Fig. 9. Both are based on back- 
ward reachability and respect the structure of NuSMV’s implementation (e.g., the order 
of fair sets is irrelevant). The difference with the non-partitioned versions is that while 
traversal and fairstates operate on a single set of states in the product automaton, 
partitioned Jraversal and partitioned-/ airstates operate on an array of sets of states 
of the system (one set for every vertex of the BA). Thus, every variable in the algorithms 
of Fig. 8 and 9 can be considered as a labeling function. For every set T C 5 of states and 
labeling L, we define the labeling function par£{Y) such that: par£{Y){v) = Y O L{v) 
for all V GV . The initial states of the product are given by par£{S[)\y^ . Given a fair set 
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Algorithm partitioned Jraversal(M ,B) 


Algorithm partitioned-fairstates(M ,B) 


1. 


a:= par£(S\ 


1. 


a := T; 


2. 


P := a 


2. 


repeat 


3. 


repeat 


3. 


a' := a; 


4. 


y:=EX[P] 


4. 


P:=T; 


5. 


P = Y\a 


5. 


for F e IT 


6. 


a := aUY 


6. 


p:=pnE[aU(anpa/-i(5)|^)]; 


7. 


until P = 0 


7. 


a := anP; 


8. 


return a 


8. 


a := anEX[a]; 






9. 


until a' = a 






10. 


return a 



Fig. 8. Fig. 9. 

F of the BA, the correspondent set in the product is given by par£{S) i^. The backward 
image of a labeling function a is given by EX[(a)](v) = Uv'g£(v) pre(a{v')) n L{v). 

We investigated if property-driven partitioning is effective for symbolic model 
checking. In particular, we applied such technique to LTL model checking. In fact, 
it is well known that, given a formula tp expressed by an LTL formula, we can find a 
BA with the same language. The standard LTL symbolic model checkers translate the 
negation of the specification into a BA, they add the latter to the model and check for 
emptiness. The goal of our experiments was to compare the performance of partitioned 
and non-partitioned SMC algorithms. Thus, we did not try to optimize the algorithms 
implemented in NuSMV, but to apply to them property-driven partitioning. The ques- 
tion we wanted to answer is whether the reduction in BDD size more than compensates 
for the algorithmic overhead involved in handling a partitioned state-space. This pro- 
vides also an indirect comparison between GSTE and standard SMC techniques. 

To verify an LTL formula tp, NuSMV calls ltl2smv, which translates ^tp into a 
symbolically represented BA with fairness constraints IT. Then, the function E^G[trMe] 
checks if the language of the product is empty. Since NuSMV does not apply any par- 
ticular technique when tp is a safety formula [22], we enhanced the tool with the option 
-safety: when tp contains only the temporal connectives X, G, and V, it constructs a 
predicate F on the automaton states (representing accepting states for the complemen- 
tary property) and calls the function E[fr«eUE]. In the following, we refer to this proce- 
dure and to the standard NuSMV’s procedure as ' 'NuSMV -safety' ' and ' 'NuSMV' ' 
respectively. We implemented the partitioned versions of both and we refer to latter ones 
as ' 'NuSMV -safety -partitioned' ' and ' 'NuSMV -partitioned' ' respectively. 
The BA is built automatically by ltl2smv in the case of non-partitioned algorithms 
while it is constructed by hand (in these experiments) in the case of partitioned algo- 
rithms. 

We run our tests on three examples of SMV models (for the SMV code, we refer the 
reader to www.science.unitn.it/~stonetta/partitioning.html). For every ex- 
ample, we chose two properties true in the model (one safety and one liveness property, 
see Tab. 1) and two properties that failed (again one safety and one liveness property, 
see Tab. 2). The first example is a dining-philosophers protocol [12]. Concurrency is 
modeled with the interleaving semantics. Typically, a philosopher iterates through a se- 
quence of four states: she thinks, tries to pick up the chopsticks, eats and, finally, she 
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Table 1. Satisfied properties 





Safety 


Liveness 


Dining 


G{{pArAX{r) AXX{r) AXXX{r)) ^ XXXX(e)) 


(AkkvGPa) (GFs) 


Mutex 


G{{ti A A2<i<v^fi) 


G{K\<,<Nti^Fci) 


Life 


G{b^Xc) 


G{{G\b) ^ FG{d)) 



Table 2. Failed properties 





Safety 


Liveness 


Dining 


G{{pArAX{r)AXX{r)AXXX{r)) ~>XXXX{-^e)) 


{GFri) ^ {GFei) 


Mutex 


G{{ti A Akkn A-ic) 


F{ti ^ G-.ci) 


Life 


G{b~*X^c) 


F{{G\b)AGF{\d)) 



puts down the chopsticks. When a deadlock condition happens, a philosopher puts the 
chopsticks down. The safety property true in this example is the following: if a philoso- 
pher is thinking and both her chopsticks are free and she is scheduled for 4 four steps 
in a row, then she will start eating. From this property, we deduce an analogous one 
which fails: with the same premises, after 4 steps the philosopher does not eat. The 
satisfied liveness property states that if every philosopher is scheduled infinitely often, 
then somebody eats infinitely often (at least one philosopher does not starve). In con- 
trast, the following liveness property does not hold in the example: if a philosopher is 
scheduled infinitely often, then she eats infinitely often. 

The second example is a mutual-exclusion protocol: N processes non-deterministi- 
cally try to access the critical session. The access is controlled by the main module, 
which guarantees that a process does not wait forever. The true safety property says 
that, if a process is the only one that is waiting, then it accesses the critical session 
in one step. If we change this property by writing that the process does not access the 
critical session in one step, we obtain the safety property that fails. The satisfied liveness 
property asserts that, if a process is trying, sooner or later it will access the critical 
session. We chose the negation of this property as an example of liveness property that 
fails. 

Finally, the third example is a variant of the game of life: at the beginning there is 
only one creature; every creature has a maximum life set to 100, but it can die non- 
deterministically in every moment; when the age is between 15 and 65, a creature can 
bear a child, which is born in the next step; at most N creatures can be born; when all 
the creatures are dead the game is reset. The true safety property states that, if a crea- 
ture is bearing a child, then the number of born creatures increases; the failed property 
states that the number decreases. The true liveness property asserts the following: if no 
creature will be born anymore, then, after a certain point in the future (likely after a 
reset), the number of alive creatures will be equal to one forever. The negation of this 
property corresponds exactly to the liveness property which failed. 

We run NuSMV on the Rice Terascale Cluster (RTC), a 1 TeraFLOP Linux cluster 
based on Intel Itanium 2 Processors. A timeout has been fixed to 172800 sec. (2 days). 
The results are shown in Fig. 10 and 11. The execution time has been plotted in log 
scale against the number N of processes in the model. Every example takes a column 
of plots. On the first row, we have the safety properties and on the second one the 
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Fig. 10. Satisfied properties of Tab. 1 . X axis: number of processes. Y axis: time. First row: perfor- 
mances of ' 'NuSMV -safety -partitioned'' and ''NuSMV -safety'' on safety properties. 
Second row: performances of ' 'NuSMV -partitioned' ' and ' 'NuSMV' ' on liveness properties. 
1st column: dining-philosophers. 2nd column: mutex. 3rd column: life 



liveness properties. Comparing the partitioned version with the non-partitioned one in 
the case of satisfied properties (Fig. 10), we notice that, in the first two columns (dining 
philosophers and mutual exclusion), the former outperforms the latter. Moreover, in the 
case of the safety property for dining philosophers and the liveness property for mutual 
exclusion, the gap is exponential, i.e. the difference between the two execution times 
grows exponentially with the size of the model. In the third column (life), NuSMV 
does not seem to get relevant benefit from the property-driven partitioning (even if 
you should notice that, in the last point of the liveness case, NuSMV runs out of time). 
Similarly, in the case of failed properties, the partitioned version outperforms always 
the non-partitioned one (see Fig. 11). Moreover, in the case of liveness properties, the 
improvement is exponential for all the three examples. 

5 Conclusions 

Our contributions in this work are two-fold. First, we elucidate the relationship be- 
tween GSTE and SMC. We show that assertion graphs are simply universal automata, 
or, viewed dually, are nondeterministic automata for the complementary properties. Fur- 
thermore, GSTE algorithms are essentially a partitioned version of standard SMC algo- 
rithms, where the partitioning is static and is driven by the property under verification. 
Second, we exported the technique of property-driven partitioning to SMC and showed 
its effectiveness in the framework of NuSMV. 

This opens us several directions for future work. First, we need to combine the tool 
with an automated generator of explicit BAs for LTL formulas and evaluate property- 
driven partitioning for more complex LTL properties. Second, it requires revisiting the 
issue of translating LTL formulas to BAs. Previous translations have focused on mak- 
ing the BA smaller (cf. [16, 1 1, 29, 15]) or more deterministic [27]. The relative merit of 
the two approaches has to be investigated in the context of property-partitioned SMC. 
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Fig. 11. Same pattern as in Fig. 10 but with the failed properties of Tab. 2 



Third, it requires revisiting the issue of symbolic fair-cycle detection. Previous works 
have compared various variations of the EL algorithm, as well as non-EL algorithms, 
cf. [2, 26, 14]. This has to be re-evaluated for property-partitioned SMC. Finally, a ma- 
jor topic of research in the last few years has been that of property-driven abstraction in 
model checking, [8, 17]. The combination of this technique with property-driven parti- 
tioning is also worth of investigation, which could benefit from the study of abstraction 
in GSTE [34,32]. 
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Abstract. We present a novel refinement relation (stnck-free confor- 
mance) for CCS processes, which satisfies the substitntability property: 
If / conforms to S, and P is any environment snch that P | S is stuck- 
free, then P | 7 is stuck-free. Stuck-freedom is related to the CSP notion 
of deadlock, but it is more discriminative by taking orphan messages in 
asynchronons systems into account. We prove that conformance is a pre- 
congrnence on CCS processes, thereby supporting modular refinement. 
We distinguish conformance from the related preorders, stable failures 
refinement in CSP and refusal preorder in CCS. We have implemented 
conformance checking in a new software model checker, zing, and we 
report on how we used it to find errors in distributed programs. 



1 Introduction 

We are interested in checking that message-passing programs are stuck-free [12]. 
Stuck-freedom formalizes the property that a communicating system cannot 
deadlock waiting for messages that are never sent or send messages that are never 
received. In this paper we extend [12] by generalizing the theory of conformance 
and by reporting on its application in model checking distributed programs. In 
our example application, programmers write contracts, which are interfaces that 
specify the externally visible message-passing behavior of the program. Con- 
tracts can be as rich as CCS processes. Stuck-freedom is ensured by checking 
that an implementation conforms to its contract using a model checker. 

Checking stuck-freedom by exploring the state space of the entire system 
quickly leads to state explosion, because the state space grows exponentially in 
the number of concurrent processes; and it requires that the entire system is 
available for analysis, which is especially unrealistic for distributed systems. We 
therefore wish to check stuck-freedom compositionally. If I is an implementation 
of a component and C is its contract, we use the notation I < C to denote that / 
conforms to C . For our compositional approach to be sound with respect to stuck- 
freedom, the conformance relation < needs to obey the following substitutability 
property: If / < C and P is any environment such that P j C is stuck-free, 
then P j 7 is stuck-free as well ( P j C denotes the parallel composition of P 
and C). Substitutability means that the contract of a component can be safely 
used instead of the component in invocation contexts, and hence it helps model 
checking to scale. 

Our notion of conformance is a novel process preorder that preserves stuck- 
ness. Stuckness can be directly observed in any labeled transition system in 
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which visible actions as well as stability (the inability to perform hidden ac- 
tions) can be observed. In our applications this is important, because we want 
to analyze a system by executing a model of it and observing what happens. 
Stuckness is more discriminative than CSP deadlock or unspecified reception 
[10], since stuckness encompasses any “left-over” action on a channel name. In 
addition to deadlock, this includes orphan messages that are never consumed 
in asynchronous systems and directly models important failure conditions in 
software such as unhandled exception messages. 

This paper makes the following contributions: 

— We define a notion of conformance based on standard CCS transition se- 
mantics and prove that it is a precongruence (i.e., it is preserved by all CCS 
contexts) satisfying the substitutability property for stuck- freedom. We dis- 
tinguish conformance from the most closely related known preorders, CSP 
stable failures refinement [3, 6, 13] and CCS refusal preorder [11]. 

— We have implemented a conformance checker in our software model checker, 
ZING. The implementation technique is quite general and can be used to 
adapt existing model checkers for software to do conformance checking. ziNG 
processes can be regarded as CCS processes, and hence our conformance 
theory applies to the zing conformance checker. We have applied zing to 
check contract conformance in a non-trivial distributed application, leading 
to the discovery of several bugs in the application. 

This paper significantly extends the work reported in [12], which proposes a 
conformance relation satisfying the substitutability property. The conformance 
relation of [12] is limited in several respects. First, conformance in [12] is defined 
by reference to the syntactic form of processes^, which precludes a purely obser- 
vational implementation of conformance checking by recording actions generated 
by transitions. Second, the relation in [12] is not a precongruence, because the 
syntactic constraints are not preserved under arbitrary CCS contexts. As a con- 
sequence, natural and useful algebraic laws fail, including the desirable modular 
principle, that P\ < Q\ and < Q 2 implies Pi \ P 2 < Qi \ Q^- Third, the 
theory in [12] is complicated by using a non-standard action^ to treat nondeter- 
minism. This paper provides a substantial generalization of conformance that is 
purely observational, is based on the standard CCS transition system, and gives 
a unified treatment of nondeterminism, hiding, and stability in terms of hidden 
actions (t) . It can be implemented in a model checker by observing visible actions 
and stability, and it can be compared to other semantic refinement relations. In 
addition to proving substitutability, we prove that our generalized conformance 
relation is a precongruence, thereby supporting modular refinement. Finally, we 
have applied this theory by implementing it in a software model checker and 
applying it to contract checking for distributed programs. 

^ For example, some cases in the conformance definition of [12] apply only to processes 
of the form or P -\- Q. Other similar syntactic dependencies of this nature is 

present in [12] as well. 

^ The action is named e in [12]. 
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The remainder of this paper is organized as follows. In Section 2 we discuss 
refinement relations in the CSP and CCS traditions that are most closely related 
to stuck- free conformance. In Section 3 we present our theory of conformance. 
In order to keep the paper within limits, we have left out all proofs in the pre- 
sentation of our conformance theory. Fully detailed proofs can be found in our 
technical report [5]. In Section 4 we describe the application of the zing confor- 
mance checker to find errors in distributed programs, and Section 5 concludes. 

2 Related Work 

Our notion of conformance is modeled on stable failures refinement in CSP [3, 6, 
13] and is inspired by the success of the refinement checker FDR [13] . Our process 
model combines the operational semantics of CCS with a notion of refusals 
similar to but distinct from the CSP notion. The definition of conformance relies 
on simulation and applies directly to any labeled transition system in which 
visible actions and stability can be observed. As is discussed in Section 3.3, 
substitutability with respect to stuck-freedom is not satisfied, if stable failures 
refinement is adopted directly into CCS. The reason is that stuck-freedom is a 
different and more discriminative notion than the CSP concept of deadlock. In 
order to accomodate this difference, our conformance relation is based on the 
idea of ready refusals, which requires a process to be ready to accept certain 
actions while (at the same time) refusing others. 

Stuck-free conformance is also related to the refusals preorder for CCS as de- 
veloped in Iain Phillips’ theory of refusal testing [11]. Refusal preorder allows 
the observation of actions that happen after refusals and can therefore express 
readiness conditions in combination with refusals. However, as discussed in Sec- 
tion 3.4, stuck-free conformance is a strictly larger precongruence than refusal 
preorder. 

The notion of 2/3 bisimulation, presented by Larsen and Skou [7] bears some 
resemblance to conformance. However, Larsen and Skou do not treat hiding and 
internal actions (r-actions), and therefore the problems of stability do not arise 
there. Our theory of conformance gives a unified treatment of internal actions, 
non-determinism, and stability, which are essential in our applications. 
Alternating simulation [1] has been used to relate interfaces to implementations 
in [4]. As discussed in [12], alternating simulation does not satisfy substitutability 
for stuck-freedom. 

3 Conformance Theory 

In this section we give necessary background on CCS [8,9], we define our no- 
tions of stuck-freedom and conformance, and we prove that conformance is a 
precongruence (Theorem 1) and that it satisfies the substitutability property 
(Theorem 2). 

3.1 CCS Processes 

We assume a denumerable set of names J\f = {a, b,c , . . The set £ = Af U {a | 
a G Af} is called the set of labels. The set A = £U {t} is called the set of actions. 
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We let a range over A and we let A range over £, and we write a = a. For a 
subset X of £ we write X = {a \ a G X}. CCS processes, ranged over by P, are 
defined by: 

P ::= 0 I A{ai, . . . , a„) | Gi + . . . + G„ | (P\Q) \ (y a)P 
G ::= a.P 



Here, A ranges over process names, with defining equations A = P. We say that 
the name a is bound in (v a)P. The free names of P, denoted fn(P), are the 
names in P that are not bound. 



Definition 1 (Structural congruence, =). Structural congruence, =, is the 
least congruence relation on terms closed under the following rules, together with 
change of bound names and variables (alpha- conversion) and reordering of terms 
in a summation: 

1. P|0 = P, P\Q = Q\P, P\{Q\R) = {P\Q)\R 

2. {v a)(P|(5) = P\{v a)Q, if a ^ fn{P) 

3. {v a)0 = 0, (ly ab)P = {v ba)P, {va){vb)P = {vb){va)P 

The operational semantics of CCS is given by the labeled transition system 
shown in Definition 2. This system is exactly the one given by Milner [9], except 
that we have added rule [cong], as is standard in recent presentations of CCS. 
In rule [sum] below, M ranges over a summation. 

Definition 2 (Labeled transition). 



M + a.P P /sum] 



p XU p' 

P\Q^P'\Q 



/par-l] 



P ^ P' Q^Q' 
P\Q^P'\Q' 

P = P' P' -^Q' Q' 



P 



Q 



P P' a fh {a, a} 
(u a)P (u a)P' 

PA[b/d]^P' A{a) = 

A{b) ^ P' 



/res] 

— /ident] 



/react] 

— /cong] 



We let PffQ = T.P + t.Q. The difference between a.P # b.Q and a.P + b.Q is 
important. The former represents internal choice, where the process chooses to 
transition to a.P or b.Q, whereas the latter represents external choice [6], where 
the environment controls whether the process moves to P or Q by offering d or 
b. The distinction between internal and external choice is crucial in our notion 
of conformance. 

Asynchronous actions are modeled as actions with no sequential continua- 
tion in CCS. Hence, asynchronous actions are “spawned”, as in a j P, where a 
happens asynchonously. 
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Notation. We write a and a for (possibly empty) sequences of names and 
actions. For a = ao . . . an-i we write P P' , if there exist Pq, Pi, . . . , Pn 
such that P = Po, P' = Pn and for 0 < z < n, we have Pi Pi+i- We use 
the notation P P' , where (r*A) G {A, rA, rrA, . . .}. We write P — > P' 
if there exists a such that P P'; P means there exists P' such that 
P P'; P — > means that P for some a. P is stable if P can make no 
hidden actions, i.e., P-/^, and P is an end-state if P can make no action at all, 
i.e., P Finally, we use the shorthand notation AGo to mean that either A 
or A is among the labels appearing in a. 

3.2 Stuck-Freedom, Ready Refusal, and Conformance 

If two processes, P and Q, communicate via local names in d, we can write the 
system as (lyd) (P|<5) • Since d are names local to P and Q, no further environment 
can interact on d, and therefore it makes sense to test whether the interaction 
of P and Q on d succeeds completely. Informally, we call a process stuck on d 
if it cannot make any progress, and some part of it is ready to communicate 
on a name in d: the communication on d has not succeeded, because it did 
not finish. In our applications, P is typically a model of a program that has a 
local connection d to a process whose specification (contract) is Q. We wish to 
check that the interaction between P and the implementation represented by 
Q is stuck-free, i.e., it cannot get stuck. Stuck-freedom is a safety property and 
can be checked by a reachability analysis of the system P | Q. The following 
definitions make the notions of stuckness and stuck-freedom precise. 

Definition 3 (Stuck process). A process P is called stuck on d, if {i^d)P is 
an end-state, and P for some AGd. We refer to such A as a residual action. 

Definition 4 (Stuck-free processes). A process P is called stuck-free on d if 
there is no P' and a such that P P' with d n d = 0, and P' is stuck on d. 

In the situation mentioned above we apply Definition 4 by searching for a tran- 
sition P I Q P' I Q' such that d n d = 0 and P' | Q' is stuck on d. The 
restriction (z/d) in Definition 3 and the condition dnd = 0 in Definition 4 enforce 
that only internal reactions (r-actions) can happen on names in d, consistently 
with d being local. Since P' | Q' for some AGd, the interaction between P 
and Q ends with a residual action. A, that cannot be matched by any co-action, 
A. If A models an input action, then a component is waiting to receive a message 
that never arrives, and if A models a send action, then a component is sending a 
message that is never consumed. An example of the latter is a remote exception 
message that is not handled. Such situations are important indicators of prob- 
lems in many asynchronous applications. Section 4 has examples of how we have 
used Definition 4 in checking conformance of actual systems. 

We seek a conformance relation, <, such that S < Q guarantees the substi- 
tutability property, that if P | Q is stuck-free, then P | S' is stuck-free, on any 
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selected names d. In the example scenario, this means that it is safe to check 
stuck-freedom of the system P \ Q, where the implementation S is represented 
by its contract Q. To achieve this goal, the relation < must be such that, if 
S < Q, then Q gets stuck on d in any context at least as often as S does. This 
requirement implies that Q may not promise to offer actions that are not ac- 
tually delivered by S and can be elegantly captured by the notion of refusal in 
CSP [6]. However, it turns out that the refusal requirement alone is not suffi- 
cient, because it does not rule out all cases of residual actions. This motivates 
the following definitions, where refusals are strengthened to ready refusals. 

Let init{P) = {a\ P and let denote the singleton sets of £ together 
with the empty set, = {{A} | A G £} U {0}. 

Definition 5 (Refusal). If X is a subset of C, we say that P refuses X if and 
only if P is stable and init(P) C\X = %. We say that P can refuse X if and only 

if there exists P' such that P P' and P' refuses X . 

Definition 6 (Readiness). If Y G we say that P is ready on Y, if and 

only if P is stable and A G F implies P Notice that any stable process is 
trivially ready on 0. 

Definition 7 (Ready Refusal). If X C C and Y G we say that P can 
refuse X while ready on Y if and only if P can refuse X from a state that is 
ready on Y, i.e., there exists P' such that P P' , P' refuses X, and P' is 
ready on Y . 

Notice that ready sets are defined in terms of actions (Definition 6) but refusals 
are defined in terms of co-actions (Definition 5). Initially, this can be a bit 
confusing. For example, the process a refuses {a} and is ready on {a}. 

Definition 8 (Conformance Relation). A binary relation TZ on processes is 
called a conformance relation if and only if whenever P TZ Q, then the following 
conditions hold: 

Cl. If P P' then there exists Q' such that Q Q' and P' TZ Q' . 

C2. If P can refuse X while ready on Y , then Q can refuse X while ready on Y. 

Condition [C2] in Definition 8 may appear surprising. It is further motivated 
and discussed below, including Section 3.3 and Section 3.4. 

If TZi and TZ 2 are binary relations on processes, we define their composition, 
denoted TZi 0 TZ 2 , by 



o 7^2 = {(P, Q) I 3R. {P, R)eTZi A {R, Q) G TZ 2 } 
Lemma 1. Let {TZi}i^j be a family of conformance relations. Then 

1. The relation Ci^iTZi is a conformance relation 

2. For any i,j G I, the relation TZi °TZj is a conformance relation 

3. The identity relation on processes is a conformance relation 
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Lemma 1.1 shows that we can define < as the largest conformance relation by 
taking the union of all conformance relations, in the way that is standard for 
CCS bisimulation and simulation [8]. 

Definition 9 (Conformance, <). The largest conformance relation is referred 
to as conformance and is denoted <. We write P < Q for (P,Q) G <, and we 
say that P conforms to Q. 

Condition [C2] of Definition 8 ensures that, ii P < Q, then Q gets stuck on a 
name a as often as P does. In Definition 8 it is very important that the readiness 
constraint is only imposed one name at a time, by the fact that the ready sets Y 
are at most singleton sets (choosing Y = % yields the standard refusals condition 
with no readiness constraint, as a special case.) This allows a specification to be 
more nondeterministic than its refinements, because the specification may resolve 
its nondeterminism differently for each name. For example, we have a + h < affb. 
Considered over the alphabet of names {a, b}, the process a + b refuses {a, b} 
and is ready on both {a} and on {b}. The process affb refuses {a, b, b} from the 
state a, and it refuses {a, a, 6} from the state b. From the state a, it is ready on 
{a}, and from the state b it is ready on {b}. Therefore, condition [C2] is satisfied. 
The reader may wish to verify a few more interesting examples: 

a\b < {a.b)ff{b.a), {a.b)ff{b.a) ^ a | 6, a|6 = a.b+b.a 

where we write P = Q if and only if P < Q and Q < P. We also remark that 
= C <, by rule [cong] (further algebraic properties of conformance can be 
found in [5].) 

Proposition 1. The conformance relation < is reflexive and transitive. 

The following theorems state our main theoretical results, precongruence (the 
operators of CCS are monotonic with respect to conformance. Theorem 1) and 
substitutability (specifications can safely be substituted for processes that con- 
form to them. Theorem 2). Full proofs are given in [5]. Let C range over CCS 
contexts, which are process expressions with a “hole” (written []) in them: 

C::=[] I (P|D) I (Dl^) I («.[] + M) I ((H []) 

We write C[Q] to denote the process expression that arises by substituting Q 
for the hole in C. 

Theorem 1 (Precongruence). P <Q implies C\P] < C[(5]. 

Theorem 2 (Substitutability). Assume P < Q. Then C\Q] stuck-free on d 
implies C[P] stuck-free on a. 

The following two sections imply that conformance is finer than stable failures 
refinement in CSP (when conformance is defined in terms of traces or, alterna- 
tively, when stable failures refinement is defined in terms of transitions) and 
coarser than refusal preorder in CCS. In this sense, conformance is in between. 
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3.3 Conformance and Stable Failures Refinement 

Condition [C2] in the definition of conformance is adapted from the theory of 
stable failures refinement in CSP [6, 13], which is based on the concepts of traces 
and refusals. In the stable failures model, a process P is represented by the set 
of its failures. A failure of P is a pair (A,X) where A is a finite trace of P and 
X is refusal set, i.e., a set of events P can refuse from a stable state after A. 
Process P failure-refines process Q if the traces of P are contained in the traces 
of Q and the failures of P are contained in the failures of Q. 

In our conformance definition we use refusals, but using the stronger ready 
refusals in condition [C2] . This is motivated by the requirement that conformance 
should be substitutable with respect to stuck-freedom. As an example, consider 
the two processes P and Q defined by 

P = a.O and Q = a.O + t.O 

Considered as processes over the label set {a, a}, the failures^ of P are: 
{((), {a}), ((a), {a, a})}, and the failures of Q are: {((), {a, a}), ((a), {a, a})}. It 
follows that P failure-refines Q. However, P is stuck on a, but Q cannot get stuck, 
since its only stable derivative is 0, which is not a stuck process. Hence, even 
though P failure-refines Q, stuck-freedom of Q does not imply stuck-freedom 
of P. In contrast, the readiness constraint in condition [C2] in our definition of 
conformance entails that P ^ Q, because P can perform action a from a stable 
state (P itself), whereas Q can only perform action a from an unstable state {Q 
itself). 

The natural trace model corresponding to stuck-free conformance modifies 
the stable failures model to use ready refusals in its failures, so that a failure is 
now a pair (A, R) where R = {X, Y) is a ready refusal consisting of a refusal set 
X C C and a ready set Y G . 

The differences between stable failures refinement in CSP and stuck-free con- 
formance results from differences between stuckness and CSP deadlock. In CSP, 
deadlock is indicated by the absence of a special, successful termination event, 
V^. Stuckness is directly observable on states in a labeled transition system and 
is more discriminative than CSP deadlock. 

3.4 Conformance and Refusal Preorder 

Our notion of conformance bears resemblance to the refusal preorder as defined 
in Iain Phillips’ theory of refusal testing [11]. In the refusal preorder, we can 
observe what happens after a refusal (or a deadlock) by turning refusal sets 
into observations. Following [2], we can define refusal preorder by adding the 
transition rule 

P — > P, provided P refuses X 

for X C C. For p G p(P) U £, we write P P' if and only if P Pi — ^ 
P 2 P'. We lift to vectors p in the usual way. We then define failure traces, 
® We express failures in terms of maximal refusal sets and use (. . .) for sequences. 
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f-traces(P) = {p G (p(>C) U £)* | P The refusal preorder, denoted <rf, 

is defined by setting P <^/ Q if and only if f-traces(P) C f-traces{Q). This 
definition captures simultaneous refusal- and readiness properties. For example, 

with C = {a, a}, we have a a — ^ 0 which shows that a is a stable state 
from which the set {a} is refused and the label a is offered. It is interesting 

to observe that we have a ^rf a + t.O, because the transition a a — ^ 0 
cannot be matched by a -I- r.O. Indeed, the refusal preorder is strong enough to 
guarantee substitutability for stuck- freedom. However, it is more restrictive than 
conformance. Consider the processes P and Q defined by 

P = (a -I- 6.c)#(c -I- b.a) 

Q = {a + &.a)#(c -I- b.c) 

It can be checked from the definitions (or see details in [5]) that P < Q but 
P ^rf Q- The reason we have P < Q is that Q is allowed to choose different 
derivatives as witnesses of condition [Cl] and [C2j. Condition [Cl] is witnessed 
by the derivative (c-l- 6.c) of Q, and [C2] is witnessed by the derivative (a -I- b.a) 
of Q. 

4 Application 

Based on the theory presented in Section 3, we have implemented a conformance 
checker that can be used to analyze asynchronous programs written in common 
programming languages. Such languages (e.g., C, Java, C#) have several features 
(such as procedure calls, pointers, shared memory, exceptions, and objects) that 
are cumbersome to model directly in CCS. Our conformance checker accepts 
a language zing, which supports such features directly without the need for 
complicated encodings. The operational semantics of zing is given as a labeled 
transition system. All statements except the send and receive statements are 
modeled as transitions labeled with r actions. Transitions from send and receive 
statements are labeled with the corresponding channel names (ignoring the data 
values that are sent or received). Note that Definition 8 of conformance makes 
no reference to the syntax of CCS, and is purely based on the semantics of the 
labeled transition system. Thus we can define conformance between two zing 
processes using Definition 8 directly. 

We have applied our conformance checker to check contracts of distributed 
programs. We next describe how the conformance checks were done and the 
errors that we found. 

Our example application is a distributed program for managing inventory 
and ordering processes for a bookstore. Contracts are specified as communicat- 
ing state machines. We used the zing conformance checker to check that imple- 
mentations conform to their contract specifications and that implementations 
do not get stuck with contracts specifying other components that they use. 

The structure of the system is shown in Figure 1 (top) . It contains five compo- 
nents, ShoppingCartlmpl, Inventorylmpl, Orderlmpl, Publiclnventorylmpl 
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Contract ShoppingCart 



Additem? . ShoppingCart 
+ Removeltem? . ShoppingCart 
+ Checkout?. AcknowledgeOrder ! 
+ Cancel? . AcknowledgeOrder ! 



Contract InventoryReservation = 

Reserveinventory? . InventoryReservation 
+ Unreserveinventory? . InventoryReservation 
+ CoiranitReservation? . Acknowledgement! 

+ CancelReservation? . Acknowledgement I 
+ Timeout 



Fig. 1. Structure of the Bookstore System (top) and two contracts (bottom) 



and Inventoryprocessing, and, in addition, two user interface processes. There 
is an arrow from component A to component B, if A uses (is a client of) B. Each 
of the five components has an associated contract, which is a specification of the 
publically visible behavior of the component. In Figure 1 (top) each contract 
is named in association with the component it specifies. Two of the contracts 
are shown in Figure 1 (bottom). A contract specifies a communicating state 
machine over a set of message types. For example, the ShoppingCart contract 
specifies that the component will repeatedly input messages of type Additem or 
Removeltem until either a Checkout message or a Cancel message is received (we 
use the notation T? for input and T! for output, where T is a message type.) The 
Timeout event used in the InventoryReservation contract is a special event. 
Timeouts can be attached to input constructs in the implementation language. 

The implementation code for a component is placed in a declaration that 
states the contract it is supposed to implement, as in: 

ShoppingCartImpI < ShoppingCart 
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By relying on the contracts it is possible to automatically translate an im- 
plementation into a ZING model that substitutes contracts for the components it 
uses. In doing so, we rely on Theorem 2. 

In addition to checking conformance between a component and its contract, 
we always perform a stuckness test that checks for stuck states in the interac- 
tion between a component and the contracts it uses. This is necessary, since, for 
example, a component might have a trivial contract and yet get stuck with com- 
ponents it uses. Such cases are caught and flagged by our conformance checker 
as well. 

The conformance analysis of the Bookstore system was done by compiling 
each component implementation and its associated contract into zing, result- 
ing in two ZING processes, which were then compared using our conformance 
checker. The conformance check was done compositionally, by checking one com- 
ponent implementation at a time against its contract specification, substituting 
contracts to represent the components that are used. The entire process is fully 
automatic. Our conformance analysis detected a number of serious errors includ- 
ing a deadlock, which are briefly described below. In the list below we indicate 
which contract was checked, and which condition ([Cl] or [C2]) of Definition 8 
was violated; we also indicate failure of the stuckness test referred to above. We 
emphasize that these errors were not found before we checked the code^. 

(1) InventoryReservation. Missing timeout specification ([C2]). The con- 
tract failed to specify that the component could timeout on its event loop. This 
resulted in a ready refusals failure identified by the conformance checker, be- 
cause the component would refuse all requests after timing out. The contract 
was corrected by adding a Timeout case. A Timeout event is modeled as an 
internal (t) action, and when attached to an input a it is represented by zing 
as a -T T.O (assuming here that nothing happens after the timeout). 

(2) InventoryReservation. Repeated input not specified ([Cl]). After cor- 
rection of the previous error, the conformance checker reported a simulation 
failure. The problem was that the contract specified that the component takes 
message CommitReservation only once, terminating the event loop of the com- 
ponent. However, the implementation did not terminate its loop on receipt of 
CommitReservation as it should, and the implementation was corrected. 

(3) ShoppingCart. Stuckness. Figure 1 shows the contract ShoppingCart 
and the corrected InventoryReservation. The implementation uses the com- 
ponents InventoryImpI and Orderimpl. The conformance check finds that the 
implementation gets stuck with the InventoryReservation contract in a situ- 
ation where the implementation receives a Checkout message, then goes on to 
send a CommitReservation message to the InventoryImpI and then waits to 
receive an Acknowledgement. This receive deadlocks, if the InventoryImpI has 
timed out. In other situations, the timeout causes stuckness by residual messages 
sent from the ShoppingCartImpI. The implementation was corrected. 



^ The code had not been thoroughly tested. Notice, however, that several of the errors 
would typically not be found by testing. 
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(4) Inventory. Input not implemented in component ([C2]). An input speci- 
fied in the contract Inventory but not implemented resulted in a refusals failure. 
The implementation was corrected. 

(5) InventoryChangeNotif ication. Inputs unavailable after receipt of a par- 
ticular message ([C2]). In one particular state, on receipt of message Done in its 
event loop, the imlmentation PublicInventoryImpI would exit its event loop 
without terminating its communication with a publish-subscribe system in the 
Inventoryprocessing component, whereas the contract does not reflect this 
possibility. Messages for that system could therefore be lost, and the implemen- 
tation was corrected. 

5 Conclusion 

We have presented a novel refinement relation for CCS and demonstrated that 
it is suitable for compositional checking of stuck-freedom of communicating pro- 
cesses. Conformance has the advantage of being directly applicable to a labeled 
transition system in which visible actions and stability can be observed. Stuck- 
ness is more discriminative than CSP deadlock by taking orphan messages into 
account, which is useful for checking asynchronous processes. We proved that 
conformance is a precongruence on CCS processes satisfying substitutability, 
and we distinguished it from related process preorders. We have built a confor- 
mance checker for zing, an expressive modeling language, and have applied it 
to a distributed system. 
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Abstract. Boolean Functional Vectors (BFVs) are a symbolic repre- 
sentation for sets of bit- vectors that can be exponentially more compact 
than the corresponding characteristic functions with BDDs. Additionally, 
BFVs are the natural representation of bit-vector sets for Symbolic Simu- 
lation. Recently, we developed set manipulation algorithms for canonical 
BFVs by interpreting them as totally ordered selections. In this paper 
we generalize BFVs by defining them with respect to a partial order. We 
show that partially ordered BFVs can serve as abstractions for bit-vector 
sets and can be used to compute over-approximations in reachability 
analysis. In the special case when the underlying graph of the partial 
order is a forest, we can efficiently compute an abstract interpretation in 
a symbolic simulation framework. We present circuit examples where we 
leverage the exponential gap in the representations and inherent struc- 
ture in the state-space to demonstrate the usefulness of Partially Ordered 
Boolean Functional Vectors. 



1 Introduction 

Symbolic Model Checking and related state-space exploration techniques usu- 
ally represent sets of states with a characteristic function, often using BDDs to 
encode the characteristic function. A characteristic function is essentially a test 
for membership, returning a one if and only if the input is in the set. Alter- 
natively, bit-vector sets can be represented symbolically by Boolean Functional 
Vectors (BFVs) [4] which map bit- vectors to bit- vectors; the set represented is 
the range of this mapping. Hence, characteristic functions serve as set acceptors 
while Boolean Functional Vectors are set generators. 

We are interested in the Boolean Functional Vector representation for two 
main reasons: 

— They can be exponentially more compact than the corresponding character- 
istic function when using BDDs as the underlying representation. 

* This work was supported in part by the MARCO/DARPA Gigascale Systems Re- 
search Center. Their support is gratefully acknowledged. 
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— Boolean Functional Vectors are the natural representation for Symbolic Sim- 
ulation. 

However, there are two drawbacks that have prevented the representation 
from being used more often. Firstly, the representation is not canonical, per 
se. Secondly, until recently there were no set manipulation algorithms for this 
representation. 

The representation can be made canonical by placing certain restrictions on 
the representation [4, 10]. Recently, we presented algorithms for set union, inter- 
section and projection based on our interpretation of the canonical BFV as an 
ordered selection [5], thus enabling Symbolic Simulation based Model Checking. 

In this paper, we generalize the framework by defining Boolean Functional 
Vectors with respect to a partial order. We show that Partially Ordered Boolean 
Functional Vectors serve as abstractions for bit-vector sets. If the underlying 
graph of the partial order is a forest, the partially ordered BFVs form a lattice 
and the abstraction defines a Galois connection between the BFV space and the 
concrete space of bit- vector sets. 

The partial order allows us to selectively constrain some variables with re- 
spect to each other while ignoring constraints between unrelated variables. This 
is in contrast to most other approaches to abstraction where some of the variables 
are discarded and all constraints between the remaining variables are retained. 

We present algorithms for (abstract) set manipulation and show how to use 
these algorithms for image computation. When the underlying graph is a forest, 
our method is a complete abstract interpretation. We then present two exam- 
ples where we leverage the exponential gap between BFVs and characteristic 
functions as well as inherent structure in the state-space to enable efficient ver- 
ification. 

1.1 Related Work 

Boolean Functional Vectors were originally used in [4] for image computation 
where symbolic simulation was used for next state computation but all other 
set manipulation operations were performed by first converting to characteristic 
functions, performing the necessary operations and converting back to Boolean 
Functional Vectors for the next iteration. The canonical BFVs were described 
in [4, 10]. An efficient algorithm to obtain a BFV from a characteristic function 
(parameterization) was presented in [1]. 

The most prominent use of Boolean Functional Vectors is in Symbolic Trajec- 
tory Evaluation (STE) [2] which is a symbolic simulation based bounded model 
checking approach. In [3], Chou developed a framework for symbolic ternary sim- 
ulation based abstract interpretation. STE has recently been extended to Gen- 
eralized Symbolic Trajectory Evaluation (GSTE) [11] to enable model checking 
of all w-regular properties. This was made possible by using a reparameteriza- 
tion algorithm to convert a given BFV into a canonical BFV. This algorithm, 
presented in [12], can be seen as a special case of the projection algorithm pre- 
sented here. GSTE also uses a ternary abstraction in which each state element is 
classified as concrete or ternary. A choice for a concrete element is represented 
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by a state variable whereas a choice for a ternary element is represented by the 
ternary value X. In this scheme, all relations between concrete values are cap- 
tured and the ternary variables can only be constrained by the concrete values. 
No relations between ternary values are captured. The abstraction by partial 
order BFVs presented here subsumes ternary abstraction. We can model the 
ternary abstraction by a partial order in which the concrete variables are related 
to each other, all concrete variables precede all ternary variables, and the ternary 
variables are unrelated to each other. 

The conjunctive canonical decomposition [7] is a symbolic representation 
closely related to Boolean Functional Vectors. This correspondence was explored 
in our earlier work [5] . The theory we develop here can be applied, with suitable 
modifications, to conjunctive decompositions as well. 

In [6], the authors perform abstraction using overlapping projections. These 
projections correspond to the chains in our partial orders. The difference is 
primarily this: with overlapping projections, there is a set constraint for each 
projection while with partially ordered BFVs, there is an evaluation function 
for each state element. With overlapping projections, the constraints between 
variables occurring together in multiple projections will be repeated in each of 
these projections. With partially ordered BFVs, each variable is constrained only 
once. 

1.2 Preliminaries 

An n- length bit- vector X is a mapping from the set of indices T = {1, . . . , n} to 
the set of Boolean values ,8 = {0, 1}. The set of all n- length bit vectors is noted 
\I ^ B\. Let ^ be a partial order on I and A the associated strict order. The 
set of ancestors for index i is given by ancestors{i) = {j\j A z}. 

Let ^min be the minimum relation whose transitive, reflexive closure is 
When the graph (X, <min) is a forest, we say that z is a root if it has no ancestors, 
and we define parent{i) when z is not a root by parent{i) <rnin *• The set of 
children of an index is then given by children{i) = {j\i = parent(j)} . 

In the following, V = {vi,. . . ,n„) represents a vector of Boolean variables, 
X and Y represent bit-vectors and F, G and H represent Boolean Functional 
Vectors. 

2 Partially Ordered Boolean Functional Vectors 

Definition 1. An (2, F)-BFV is a vector of Boolean functions F = (/i, . . . , /„) 
such that for all indices i: 

n(.v) = fl(y) + ff(v)-v. 

where ff and ff are mutually exclusive Boolean functions of the ancestors of i: 
f}{X) ■ ff{X) = 0 

(Vj G ancestor s{z) MX) = f,(Y)) ^ (//(X) = //(V)) A (/f(X) = /f(f)) 
for allX,Y G [I ^ B]. 
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An (X, A)-BFV is to be interpreted as an ordered selection. The variable Vi 
represents an input choice for the z-th bit while ft is the corresponding selection. 
The definition requires that the selection for the z-th bit is constrained only by 
the selections for its ancestors. We will refer to // and ff as the forced-to-one and 
free-choice conditions for the z-th bit. Additionally, we define the forced-to-zero 
condition = ^(// -h /f). 

Definition 2. We define the space to include all (X, :<)-BFVs and extend 

it to by including a bottom element to represent the empty set: 

= {F\F is an (X, ^)-BFV} 

We now define a concretization function which maps an (X, A)-BFV to its 
range and the bottom element to the empty set. 

Definition 3. The concretization function 7 : 1 -^ ^([2^ ^ ^]) is given 

by: 

7(T) = 0 

7(F) = {X&[X^ B]\BY G [X B].F{Y) = X} for all F G 

We say that F abstracts a bit vector set S if j{F) X S and that F repre- 
sents S if 7(X) = S. Not all bit-vector sets S can be represented in _.^but 
our definition ensures that if there is a representation for S, then it is unique. 
Additionally, vectors in the range of an (X, A)-BFV are mapped to themselves: 

Theorem 1. Given F,G G X(x.<) o,nd X G [X 1-^ ,8].' 

7(F) = 7(G) G [I ^ B].F{X) = G{X) 

X G 7(F) 47 F{X) = X 

The above theorem gives us a procedure to obtain the characteristic function 
1^7(X) ^ from its BFV F. Recall that x.y(^p){X) = 1 if and only if X G 7(F). 

From Theorem 1 it follows that y^^^^(X) = 1 if and only if F{X) = X. Hence, 
we can derive: 

iei 

We observe that (vi ^ fi,...,Vn ^ fn) is a canonical conjunctive decomposi- 
tion [7] for 7(F). The theory we develop in this paper for Boolean Functional 
Vectors applies, with suitable modifications, to conjunctive decompositions as 
well. 

We now define a partial ordering C on by lifting the subset ordering 

C on bit-vector sets. 

Definition 4. The partial ordering G is defined on by: 

FCG47 7 (F) C 7 (G) 
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Fig. 1. Partial Orders used in Example 1 



Example 1. Given S = {000, 100, 111}, I = 1,2,3 and V = (vi,V 2 ,V 3 ), let ^i, 
^2 and ^3 be the reflexive transitive closures of the partial orders depicted in 
Figures 1(a), 1(b) and 1(c) respectively. 

The (X, Ai)-BFV Fi = (vi,vi ■ V 2 ,vi ■ V 2 ) represents S, i.e., 7(^1) = S. 

The (X, ^2)-BFV F 2 = (t^i,fi • V 2 jV\ ■ V 3 ) abstracts S. We have 7(^2) = 
{000, 100, 101, 110, 111} which is a superset of S. Moreover, F 2 is the minimum 
abstraction for S in 

The (X, A3)-BFVs G = {{v2 ■ V3) + (~^V2 ■ -’U3) • fi,t^2,f3) and H = {{v2 + 
fa) + (^ti2 • • vi,V 2 ,V 3 ) abstract S, since 7(G) = {000,001,010,100,111} 

and 7 ( 1 ?) = {000, 101, 110, 100, 111}. Note that G and H are unrelated minimal 
abstractions. 

Lemma 1. If (X, -<min) is a forest then there is a minimum abstraction a{S) 
in for every set S C X([X 1-^ B]). 

Theorem 2. If (X, is a forest then: 

(-^(i ^)> !=) forms a complete lattice. For F, G S ^y the least upper bound 
and greatest lower bound are given by: 

F UG = a{'^{F) U 7(G)) 

F n G = a{-y{F) n 7(G)) 

2. The pair of adjoined functions (a, 7) forms a Galois connection between the 
concrete space {V{[T 1— > /S]),C) and the abstract space {iF^ ^yG). For all 

S eV([I^ B]) and F € 



S C 7(0(5')) 

F = o(7(F)) 

Note that concretization does not lose any information. Furthermore, if (X, A) 
is a total order, then the abstract space IF^ is isomorphic to the concrete space 
V{[T I— > B]) and no information is lost in abstraction, either: 

Theorem 3. If iF^ is totally ordered, then for all S S X^X i— > B]): 



5 = 7 ( 0 ( 5 )) 
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3 Algorithms 

In this section, we assume that (I, ~<min) is a forest and present algorithms for 
computing the least upper bound and greatest lower bound of (X, A)-BFVs F 
and G to give us abstractions for set union and intersection respectively. We also 
present an algorithm to compute the minimum abstraction for projection. 

The set union and intersection algorithms are modified from [5] to take into 
account that ^ is not necessarily a total order. The algorithm for projection is 
new and is more efficient than computing the union of the cofactors because of 
fewer intermediate computations. 

The algorithms presented here can be modified for the case when (X, A mm) 
is not a forest. The modifications are required to account for cases when there 
are conflicting constraints from unrelated ancestors for the selection of a bit. In 
such a case, we could obtain a minimal abstraction by choosing the constraint 
from one of these ancestors, or an upper bound of these minimal abstraction by 
ignoring all constraints in case of a conflict. 

3.1 Set Union 

In selecting a vector from the union of two sets, we could select the vector from 
either of the operands. If we select the bits one at a time (in an order consistent 
with A), initially we can make a selection from either set. In this scenario, the 
bit being selected is forced-to-zero(one) if and only if it is forced-to-zero(one) in 
both sets. We can continue to select from either set, until we commit ourselves 
to one of the operands. This happens when the bit being selected is forced-to- 
zero(one) in one of the operand sets and we select the opposite value. From then 
on, we can exclude the operand set that disagrees with our selection. 

Given (X, A)-BFVs F and G, we define conditions /f and gf to indicate when 
F and G can be excluded from the selection for the union. If z is a root, then: 

/f = 0 
gf = 0 

Otherwise, let j = parent (i): 

fr = fj+f!-h, + fi-^h, 

gf = 9j + 9j ■ hj + g] ■ ^hj 

We now define FI so that bit i is forced-to-zero(one) in the union if and only if 
it is forced-to-zero(one) in the non-excluded sets: 

h^^=ff■9f + ff■9f + ff■9f 

hl = fl-gl + fl-gf + ff-g} 

Theorem 4. Given (X, <)-BFVs F and G, let H be defined as above. Then: 

H = FUG 

From Theorems 2 and 4, it follows that our algorithm computes an over-approxi- 
mation of the corresponding set union, i.e., j{H) X ^{F) U 7 (G). 
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3.2 Set Intersection 

A vector can be selected from the intersection of two sets only if it can be selected 
in both sets. Hence, we can make selections for bits only if both operands agree 
on the selection. The selection is forced-to-zero(one) in the intersection if it is 
forced-to-zero(one) in either operand. We must be careful, however, to avoid 
conflicts that can occur when, for some bit, the selection in one operand is 
forced-to-zero while the selection is forced-to-one in the other operand because 
of the selections made for the ancestors of the bit in question. 

Given (I, A)-BFVs F and G, we define elimination conditions Cj to indicate 
conflicting selections while computing the intersection: 

e* = H ifj ■ 9 ] + fj ■ 9j + .ej) 

j ^childr en{i) 

If there is no possible selection for some bit, then the intersection is empty: 

(3z.Ci = 1) H = _L 



Otherwise, we obtain a vector K by eliminating the conflicts: 

= fi + 9i + 
k} = fl + g\ + 



We then obtain Fl by normalizing K by propagating the selection constraints 
(introduced by the elimination) downstream: 



ht = k^ 
hi = 



*^hj j ^ancestor s{i) 
*^hj j ^ancestor s{i) 



Theorem 5. Given (I,<)-BFVs F and G, define H as above. Then: 



H = PnG 



As with set union, it follows from Theorems 5 and 2 that H is an over-approxi- 
mation of the corresponding set intersection, i.e., j{H) O jiF) n 7 (G). 

3.3 Projection 

Given T' C X, the (existential) projection of a set S G V{[X 1 — > B]) on X' is: 

projx>{S) = {X G [X' ^ B]\3Y G S.Vi G X' .Y{{) = X{i)} 

In selecting a vector for the projection, we can select an X G [X' 1 — > B] as 
long as there is some selection for the bits in {X \ X') that can extend X to some 
vector Y in S. The projection loses information about the relation between the 
bits retained and the bits projected out. We capture this information with the 
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Fig. 2. Symbolic Reachability Analysis using Symbolic Simulation with Boolean Func- 
tional Vectors 



don’t-care conditions ff‘^. If i is a root, then ff'^ = 0. Otherwise, let j = parent{i) 
in: 

/f = /f 

= + fj • hj + fj ■ ~^hj otherwise. 

The abstract projection H is now defined so that a bit is forced-to-zero(one) if 
and only if it is forced-to-zero(one) in the care space irrespective of the values 
of the projected out bits. Let V" = {vi\i G {1 \ 1')}- Then, for i e J': 

h\ =^V" .{fl + ft'^) 



Theorem 6. Given an {T,^)-BFV F and X' C X, let Ff be defined as above. 
Then: 

H = a{proji,{-f{F))) 



4 Symbolic Simulation, Model Checking and Abstraction 

Using the algorithms from the previous section, we can perform the computations 
necessary for symbolic model checking using Boolean Functional Vectors. We can 
compute the image (or pre-image) of a given set of states relative to a transition 
relation by computing the relational cross product which involves intersection 
and projection. The fix-point iterations also require set union and an equality 
check to determine the fix point. If the model is a circuit, however, forward image 
computation can be performed by symbolic simulation, making it unnecessary 
to compute the transition relation and its intersection with the current set of 
states for forward reachability analysis (Figure 2). 

Consider a circuit with state elements S = (si, . . . , Sj) and transition func- 
tions A = (Si, , Sn), we associate the choice variables V = (vi, . . . , Vn) with 
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(a) Sorter (b) A compare-and-swap element 

Fig. 3. An Up/Down Sorter 



S. Given an (X, A)-BFV for a set of current states F, we can simulate the 
circuit with the state variables set to F to obtain the next state vector G = 
{6i{F), . . . , Sn{F)). We can then reparameterize G using the next-state variables 
y' = (fn+ii ■ ■ ■ ,U 2 n) to obtain a canonical abstraction H. Note that the range 
of G is the exact image of the set of states we simulated with. The abstraction 
occurs during reparameterization and is determined by the partial order A. 

H is obtained by projecting out the current state bits from the extended 
(l 2 n,^ 2 n)-BFV {vi, . . . , Vn, 9l, ■ ■ ■ , 9n) Over (X2n,^2n) where l2n = {l,...,2n} 
and :< 2 n is obtained by extending A to X 2 n by requiring that all present state 
bits precede all next state bits. In practice, we project out one variable at a 
time, heuristically choosing a ‘low-cost’ variable. This allows us to optimize the 
projection algorithm by only modifying (and computing constraints for) the com- 
ponents that are dependent on the variable being projected out, thus avoiding 
the generation of the monolithic relation between the non-canonical G and the 
canonical H. 

We note that symbolic simulation followed by re-parameterization computes 
a complete abstract interpretation of the circuit transition function [3] when 
(X, ^rnin) is a forest. 

4.1 Example: Up/Down Sorter 

Consider the Up/Down Sorter [9] of Figure 3. It has 2n words, each m bits wide, 
arranged in two columns. Initially, all the words are set to 0. On any cycle, we 
may either insert a word at Input or extract the maximum word in the sorter 
from Output. The operation of the sorter may be viewed as a two step process: an 
insert or extract followed by a compare and swap. If we are inserting a word, then 
in the first step, all words in the left column are pushed down, with Li getting 
Input. For a read operation, in the first step all words in the right column are 
pushed one up, with getting 0 and the value in R\ read out at Output. In the 
second step, adjacent words are compared and if the entry in the left column is 
greater than the corresponding entry in the right column, the two are swapped. 
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1. Insert: 

{L'l <— Input) A (VI < i < n.L[_^_i <— Li) 

or Read: 



{Output ^ i?i) A (VI < i < n.R{ <— Ri+i ) A « - 0) 

2. Compare and Swap: 

VI < i < n.{L{, R') ^ if (L* > Ri) then {Ri, Li) 

else {Li,Ri) 

The sorter works by maintaining the right column in sorted order so that 
-Ri > ^2 > • ■ ■ > Rn ■ Additionally, the right entry is always greater than or 
equal to the corresponding left entry, i.e., Ri > Li for all 1 < i < n. These two 
invariants guarantee that Ri is the maximum entry in the sorter. 

The basic data structure is a sorted array (the right column) which is hard 
to represent as a characteristic function with BDDs. Let rj represent the j-th 
bit of the j-th word of the right column. If we choose a word-based BDD variable 
order, i.e., {r\,r \, . . . , r™, . . . , . . . , r™), then the BDD for the characteristic 

function is exponential in m. On the other hand, if we choose the bit-sliced 
order (r(, . . . , . . . , r™, . . . , r™), then the representation is exponential in 

n. If m = n, then we conjecture that the BDD for the characteristic function is 
exponential in n, irrespective of the variable order used. 

The sorted array can be efficiently represented by a Boolean Functional Vec- 
tor using BDDs, by organizing the words as a balanced binary search tree, e.g.. 
Figure 4, and using a bit-sliced BDD variable order. The words are constrained 
only by the values of their ancestors. The relation between the unrelated words 
is captured implicitly by transitivity through common ancestors. The largest 
BDDs are for the m-th bits of the leaves. These are linear in m and exponential 
in the depth of the tree, hence linear in n. Since there are n ■ m functions in 
the Boolean Functional Vector, one for each bit of each entry, the bound for the 
overall representation is quadratic in n and m. 




Fig. 4. Partial Order on Words for an Up-Down Sorter with n — 7 



Figure 5 plots the sizes of the Boolean Functional Vector (using the binary 
tree layout) and characteristic function representations (using the linear and bit- 
sliced BDD variable orders) for the reachable states of an up-down sorter with 
a depth of n {2n entries), each n bits wide (m = n). 
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Fig. 5. Size of BDD representations for Up-Down Sorters with with n = m 
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Fig. 6. Symbolic Reachability Analysis for Up-Down Sorters with n = m 



The partial order in this case is accurate enough to represent the actual state- 
space of the sorter so that there is no information lost in the abstraction. The 
ordering constraints between entries unrelated by the partial order are captured 
by implicit transitivity through their common ancestor. Hence, the use of the 
partial order does not affect the size of the state-space; any linear extension of 
the partial order (e.g. the DFS order) would give the same size for the final state- 
space. However, the partial order is useful during reachability analysis since it 
prevents the computation of unnecessary constraints. 

Figure 6 shows the peak live node count and runtime for reachability analysis, 
starting from a state with all entries set to 0. The experiments were performed 
on a Sun-Fire 280R (SUN UltraSPARC-HI-l- 1015 MHz CPU) with the memory 
limit set to 1 GB and the time limit set to 10 hours. The experiments with 
the characteristic function representation were performed using VIS with the 
MLP heuristics [8]. The current-state and next-state variables are interleaved in 
the BDD variable order. As expected, there is an exponential gap between the 
Boolean Functional Vector and characteristic function based approaches. Note 
that there is a significant performance improvement obtained by using the partial 
order BFVs instead of the totally ordered BFVs. The largest experiment with 
partially ordered BFVs (n = 20) had 800 state bits, all of which are relevant to 
the property, i.e. the global sorted order. 
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4.2 Example: FIFO Equivalence 

Consider the two implementations of a FIFO queue shown in Figure 7. In the 
shift register, new entries are inserted at the front of the queue, the other entries 
shifting over by one. In the ring buffer, the new entries are inserted at the 
head pointer. In both implementations, the oldest entries are read out at the tail 
pointer. We can check the equivalence of the two implementations by performing 
reachability analysis on the product machine. 

Given the value for the head pointer, there is a correspondence between the 
entries in the two queues. In general, though, we cannot fix such a correspondence 
since the head pointer can change. The BDD for the characteristic function of 
the reachable state-space is provably exponential in n, irrespective of the variable 
order. In [7], McMillan showed that there is a canonical conjunctive decomposi- 
tion for this state-space that is quadratic in n. The basic observation is that the 
entries are conditionally independent since we can fix the correspondence once 
the value of the control signals is known. The conjunctive decomposition factors 
out the conditionally independent variables into separate components. The same 
observations essentially apply to the Boolean Functional Vector representation. 

We can take this further by realizing that the entries in any one of the queues 
are not correlated. The correlation we are interested in is between entries in the 
shift register and the corresponding entries (given the control signals) in the 
ring buffer. Hence, we can use a partial order such as the one in Figure 7(c). We 
obtain approximately a 3X improvement in runtime with the partially ordered 
BFVs as compared to the totally ordered BFVs for all values of n tried. 

5 Conclusions and Future Work 

We have developed a general framework for symbolic simulation, model check- 
ing and abstraction using partially ordered Boolean Functional Vectors. Our 
examples demonstrated that this framework can allow us to efficiently verify 
some circuits where characteristic functions would fail. The most important fu- 
ture work from a practical point of view is the development of an automated 
abstraction-refinement framework and a dynamic reordering procedure for BFV 
components to work in tandem with BDD variable reordering. 




Fig. 7. Equivalence Checking for two Implementations of a FIFO 
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Abstract. The existence of functional dependency among the state vari- 
ables of a state transition system was identified as a common cause of 
inefficient BDD representation in formal verification. Eliminating such 
dependency from the system compacts the state space and may signifi- 
cantly reduce the verification cost. Despite the importance, how to de- 
tect functional dependency without or before knowing the reachable state 
set remains a challenge. This paper tackles this problem by unifying 
two closely related, but scattered, studies — detecting signal correspon- 
dence and exploiting functional dependency. The prior work on either 
subject turns out to be a special case of our formulation. Unlike previ- 
ous approaches, we detect dependency directly from transition functions 
rather than from reached state sets. Thus, reachability analysis is not a 
necessity for exploiting dependency. In addition, our procedure can be 
integrated into reachability analysis as an on-the-fly reduction. Prelimi- 
nary experiments demonstrate promising results of extracting functional 
dependency without reachability analysis. Dependencies that were un- 
derivable before, due to the limitation of reachability analysis on large 
transition systems, can now be computed efficiently. For the application 
to verification, reachability analysis is shown to have substantial reduc- 
tion in both memory and time consumptions. 



1 Introduction 

Reduction [12] is an important technique in extending the capacity of formal ver- 
ification. This paper is concerned with property-preserving reduction [7], where 
the reduced model satisfies a property if and only if the original model does. 
In particular, we focus on reachability-preserving reduction for safety property 
verification using functional dependency. 

The existence of dependency among state variables frequently occurs in state 
transition systems at both high-level specifications and gate-level implementa- 
tions [17]. As identified in [11], such dependency may cause inefficient BDD [6] 
representation in formal verification. Moreover, it can be used in logic minimiza- 
tion [13, 17]. Its detection, thus, has potential impact on formal verification and 
logic synthesis, and has attracted extensive research in both domains (e.g., see 
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[2, 13, 11, 8, 17]). The essence of all prior efforts [2, 13, 8, 17] can be traced back 
to functional deduction [5], where variable dependency is drawn from a single 
characteristic function. Thus, as a common path, the variable dependency was 
derived from the characteristic function of a reached state set. However, state 
transition systems of practical applications are often too complex to compute 
their reachable states, even though these systems might be substantially reduced 
only after variable dependency is known. An improvement was proposed in [8] 
to exploit the dependency from the currently reached state set in every iteration 
of a reachability analysis. The computation, however, may still be too expensive 
and may simplify subsequent iterations very little. 

To avoid such difficulty, we take a different path to exploit the dependency. 
The observation is that the dependency among state variables originates from 
the dependency among transition functions^. In consideration of efficiency, some 
variable dependency can better be concluded directly from the transition func- 
tions rather than from the characteristic function of a reached state set. There- 
fore, the computation requires only local image computation. As the derived 
dependency is an invariant, it can be used by any HDD- or SAT-based model 
checking procedure to reduce the verification complexity. Since not all depen- 
dency can be discovered this way due to the imperfect information about state 
reachability, this method itself is an approximative approach. To complete the 
approximative computation, our procedure can be embedded into reachability 
analysis as an on-the-fiy detection. Reachability analysis is thus conducted on a 
reduced model in each iteration. Our formulation leads to a unification of two 
closely related, but scattered, studies on detecting signal correspondence [10, 9] 
and exploiting functional dependency [11,8]. 

2 Preliminaries and Notations 

As a notational convention, a vector (or, an ordered set) v = {vi, . . . ,Vn) is 
specified in a bold-faced letter while its unordered version is written as {t>} = 
{ui, . . . ,Vn}- In this case, n is the cardinality (size) of both v and {d}, i.e., |t)| = 
|{v}[ = n. Also, when a vector v is partitioned into k sub-vectors V\, . . . ,Vk, 
the convention (t>i; . . . ; Vk) denotes that Vi, . . . ,Vk are combined into one vector 
with a proper reordering of elements to recover the ordering of v. 

This paper assumes, without loss of generality, that multi-valued functions 
are replaced with vectors of Boolean functions. The image of a Boolean functional 
vector over a subset C of its domain is denoted as Image{'tp,C)] the range 
of '0 is denoted as Range{xf). Let 0 : B” ^ B be a Boolean function over 
variables . . . ,Xn- The support set of ip is Supp{ip) = {xi \ {ip\xi=o XOR 
ip\x^=i) 0 false}. For a characteristic function F{x) over the set {a;} of Boolean 

^ As state transition systems are often compactly representable in transition functions 
but not in transition relations, this paper assumes that transition functions are the 
underlying representation of state transition systems. Consequently, our formulation 
is not directly applicable to nondeterministic transition systems. The corresponding 
extension can apply the MOCB technique proposed in [11]. 
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variables, its projection on {y} C {jc} is defined as F[{y}/{x}] = 3xi G 
|a;}\{j/}._F(a;). Also, we denote the identity function and its complement as 9 
and 9^, respectively. 

A state transition system At is a six-tuple (S', /, S, 17, S, A), where S is a finite 
set of states, / C S is the set of initial states, S and 17 are the sets of input 
and output alphabets, respectively, and d : S x S ^ S (resp. A : A x S ^ 17) 
is the transition function (resp. output function). As symbols and functions are 
in binary representations in this paper. At will be specified, instead, with a 
five-tuple (/, r, s, S, A), where r (resp. s) is the vector of Boolean variables that 
encodes the input alphabets (resp. states). 

3 Functional Dependency 

We formulate functional dependency for state transition systems in two steps. 
First, combinational dependency among a collection of functions is defined. 
Second, the formulation is extended to sequential dependency. 

3.1 Combinational Dependency 

Given two Boolean functional vectors ^ ^ B™ and : B* ^ B” over 

the same domain, we are interested in rewriting cj) in terms of a function of tp. 
The condition when such a rewrite is feasible can be captured by a refinement 
relation, fl C (B* ^ B™) x (B^ ^ B”), defined as follows. 

Definition 1. Given two Boolean functional vectors 0 : B* ^ B™ and ip : 
B^ ^ B”, p> refines cf in C C B^, denoted as Gq 'Ft if 4‘{o) ^ 4>{b) implies 
p>{a) yf <p{b) for all a,b G C. 

In other words, (p refines ^ in C if and only if (p is more distinguishing than 
4> in C. (As the orderings within (f> and ip are not a prerequisite, our definition 
of refinement relation applies to two unordered sets of functions as well.) In the 
sequel, the subscription C will be omitted from the refinement relation C when 
C is the universe of the domain. Based on the above definition, the following 
proposition forms the foundation of our later development. 

Proposition 1. Given <f> -.W ^ B™ and : B^ ^ B”, there exists a functional 
vector 0 : B” — > B'" such that <j> = 9 o p — 6{p{-)) over C QVf if and only if 
4> Cc V- Moreover, 9 is unique when restricting its domain to the range of p. 

For 4> = 9 o p, we call (f>i, . . . , (fra G <P the functional dependents (or, briefly, 
dependents), pi,...,pn G p the functional independents (or, briefly, inde- 
pendents), and 9i, . . . ,9n G 9 the dependency functions. 



Problem Formulation. The problem of detecting (combinational) functional 
dependency can be formulated as follows. Given a collection of Boolean functions 
xp, we are asked to partition xp into two parts <p and p such that <p = 9 (p). Hence, 
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the triple (^, 9) characterizes the functional dependency of xp. We call such a 

triple a dependency triplet. Suppose (f cannot be further reduced in {cp, cp, 9) 
by recognizing more functional dependents from <p with all possible modifications 
of 9. That is, \(p\ is minimized; equivalently, \cp\ is maximized. Then the triplet 
maximally characterizes the functional dependency of In this paper, we are 
interested in computing maximal functional dependency. (Although finding a 
maximum dependency might be helpful, it is computationally much harder than 
finding a maximal one as it is the supremum over the set of maximal ones.) 

The Computation. In the discussion below, when we mention Boolean func- 
tional vectors 4>{x) and p>{x), we shall assume that <p -.W ^ and : B^ ^ 
B” with variable vector x : B*. Notice that Supp{cf>) and Supp{ip) are subsets of 
{x}. The following properties are useful in computing dependency. 

Theorem 1. Given functional vectors <p and tp, <f> Q tp only if Supp{4>) C 
Supp{p>). 

Corollary 1. Given a collection of Boolean functions 'ipi{x), . . . ,i/)fe(x), if for 
any Xi € {x}, xpj is the only function such that Xi € Supplppj), then ipj is a 
functional independent. 

With the support set information. Theorem 1 and Corollary 1 can be used as a 
fast screening in finding combinational dependency. 

Theorem 2. Given functional vectors cp and p>, <p\Z p> if and only if\Range{<p)\ 
= \Range{{cp,ip))\. 

Theorem 3. Let Oi € 9 he the corresponding dependency function of a depen- 
dent (pi e cp. Let 6>° = {(p{x)\(pi{x) = 0} and 0\ = {p>{x)\(pi{x) = 1}. Then 
<Pi Q ‘P if and only if 0° n 0} = 0. Also, 9i has 0^, 0\, and B”\{6>° U 0}}as 
its off-set, on-set, and don’t-care set, respectively. That is, 6i{ip{x)) = (pfx) for 
all valuations of x. 

From Theorem 2, we know that the set {pf of functional independents is as 
distinguishing as the entire set {cp} U {p} of functions. Theorem 3, on the other 
hand, shows a way of computing dependency functions. 

Given a collection {■0} of Boolean functions, its maximal dependency can be 
computed with the procedure outlined in Figure 1. First, by Theorem 2, for each 
function ipi S {0} we obtain the minimal subsets of {0} which refine ^pi. Let the 
minimal refining subsets for ipi he Ep , . . . , Ep . (Notice that fc > 1 since ipi refines 
itself and, thus, {ipi} is one of the subsets.) The calculation can be done with 
local image computation because by Theorem 1 and Corollary 1 we only need 
to consider subsets of functions in {0} which overlap with ipi in support sets. 
Second, we heuristically derive a minimal set of functional independents that 
refines all the functions of {0}. Equivalently, for each ipi, some Ep’ is selected 
such that the cardinality of Ul=i minimized. This union set forms the 

basis of representing all other functions. That is, functions in the union set are 
the functional independents; others are the functional dependents. Finally, by 
Theorem 3, dependency functions are obtained with respect to the selected basis. 
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CombinationalDependency 

input: a collection {?/j} of Boolean functions 
output: a dependency triplet (</>, (p, 6) 

begin 

01 for each i/’i £ {'*/’} 

02 derive minimal refining sets E{, . . . , El 

03 select a minimal basis {(fi} that refines all tpi £ {i/i} 

04 compute the dependency functions {6} for {0} = 

05 return (0, 6) 

end 



Fig. 1. Algorithm: CombinationalDependency. 



A Digression. There were other variant definitions of dependency (see [15] for 
more examples). The functional dependency defined in [5] (Section 6.9), which 
follows [15], is too weak to be applicable in our application. We, thus, resort to a 
stronger definition. As noted below, our definition turns out to be consistent with 
functional deduction (see [5], Chapter 8), which is concerned with the variable 
dependency in a single characteristic function. 

We relate our formulation to functional deduction as follows. In functional 
deduction, variable dependency is drawn from a single characteristic function. 
Thus, to exploit the dependency among a collection of functions a single re- 

lation y) = /\^{yi = 4>i{x)) should be built, where yfs are newly introduced 
Boolean variables. In addition, to derive dependency solely among {y}, input 
variables {®} should be enforced in the eliminable subset [5]. With the foregoing 
transformation, variable dependency in functional deduction coincides with our 
defined functional dependency. A similar result of Theorem 3 was known in the 
context of functional deduction. Compared to the relational-oriented functional 
deduction, our formulation can be understood as more functional-oriented, which 
is computationally more practical. 

3.2 Sequential Dependency 

Given a state transition system A4 = (/, r, s, S, A), we consider the detection of 
functional dependency among the set {^} of transition functions. More precisely, 
detecting the sequential dependency of M is equivalent to finding 9 such that 
S is partitioned into two vectors: the dependents <5^, and the independents S^. 
Let {s} = {s<^}U {s<^} be such that the valuations of and are updated by 
Srf, and S^, respectively. Then 9 specifies the dependency of M hy s,p — 9{s^) 
and = 9{6,p), i.e., 5^{r, {9{s^); s^)) ^9 o S^{r, {9{s^); s^)). 

Sequential dependency is more relaxed than its combinational counterpart 
because of the reachability nature of A4 . The derivation of 9 shall involve a fixed- 
point computation, and can be obtained in two different ways, the greatest fixed- 
point (gfp) and the least fixed-point (Ifp) approaches, with different optimality 
and complexity. Our discussions start from the easier gfp computation, and 
continue with the more complicated Ifp one. The optimality, on the other hand, 
is usually improved from the gfp to the Ifp computation. 




Functional Dependency for Verification Reduction 



273 



Remark 1. We mention a technicality regarding the set / of initial states. In 
general, the combinational dependency among transition functions may not hold 
for the states in I because I may contain dangling states. (A state is called 
dangling if it has no predecessor states. Otherwise, it is non-dangling.) To 
overcome this difficulty, a new set /' of initial states is defined. Let /' be the 
set of states which are one-step reachable from I . Now, since all states in /' are 
non-dangling, the calculated dependency holds for On the other hand, the set 
of reachable states from / is identical to that from /' except for some states in I. 
In the verification of safety properties, such a substitution is legitimate as long 
as states in I satisfy the underlying property to be verified. In our discussion, 
unless otherwise noted, we shall assume that the set of initial states consists of 
only non-dangling states. 

The Greatest Fixed-Point Calculation. In the gfp calculation, state vari- 
ables are treated functionally independent of each other initially. Their depen- 
dency is then discovered iteratively. Combinational dependency among transition 
functions is computed in each iteration. The resultant dependency functions are 
substituted backward in the subsequent iteration for the state variables of their 
corresponding functional dependents. Thereby, the transition functions and pre- 
viously derived dependency functions are updated. More precisely, let 0^''^ be 
the set of derived dependency functions for at the ith iteration. For j from 
f — 1 to 1, the set of dependency functions is updated in order with 

. . . ; After the updates of 

jg ggi^ . . . ; where C {^} cor- 

responds to the functional independents of . At the {i + l)st iteration, the 
combinational dependency among is computed. The iteration terminates 

when the size of the set of functional independents cannot be reduced further. 
The termination is guaranteed since | decreases monotonically. In the end of 
the computation, the final 0 is simply the collection of and the final set of 

<^(k) 

functional independents is , where k is the last iteration. The computation is 
summarized in Figure 2, where the procedure CombinationalDependency Restore 
is similar to ComhinationalDependency with a slight difference. It computes the 
dependency among the set of functions given in the first argument in the same 
way as ComhinationalDependency. However, the returned functional dependents 
and independents are the corresponding functions given in the second argument 
instead of those in the first argument. 

Notice that the final result of the gfp calculation may not be unique since, 
in each iteration, there are several possible choices of maximal functional depen- 
dency. As one choice has been made, it fixes the dependency functions for state 
variables that are declared as dependents. Thereafter, the dependency becomes 
an invariant throughout the computation since the derivation is valid for the 
entire set of non-dangling states. For the same reason, the gfp calculation may 
be too conservative. Moreover, the optimality of the gfp calculation is limited 
because the state variables are initially treated functionally independent of each 
other. This limitation becomes apparent especially when the dependency to be 
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SequentialDependencyGfp 

input: a state transition system A4 = (/, r, s, S, X) 
output: a dependency triplet {5^, 8^,0) for 8 

begin 



01 


i ■- 0; := 8 






02 


repeat 






03 


if i > 2 






04 


for j from i - 


- 1 to 1 




05 


:= 




.;0«(s«);s«)) 


06 


if i > 1 






07 






...;0«(s«);4'’)) 


08 


i := i -\-l 






09 


(8f,8^;\e^^)-. 


= CombinationalDependency Restore (8^^\8) 


10 


until = 1^)^^ 


1 




11 


return ; 




..;0(-i))) 



end 



Fig. 2. Algorithm: SequentialDependencyGfp. 



discovered is between two state transition systems (e.g., in equivalence check- 
ing) . To discover more dependency, we need to adopt a least fixed-point strategy 
and refine the dependency iteratively. 



The Least Fixed-Point Calculation. In the Ifp calculation, unlike the gfp 
one, the initial dependency among state variables is exploited maximally based 
on the set of initial states. The dependency is then strengthened iteratively 
until a fixed point has been reached. The set of functional independents tend to 
increase during the iterations, in contrast to the decrease in the gfp calculation. 

Consider the computation of initial dependency. For the simplest case, when 
|/| = 1, any state variable can be selected as the basis. Any other variable 
is replaced with either 9(s^) or depending on whether its initial value 

equals that of s^p or not. For arbitrary /, the initial variable dependency can be 
derived using functional deduction on the characteristic function of I. (As noted 
in Remark 1, excluding dangling states from I reveals more dependency.) 

For the iterative computation, transition functions are updated in every it- 
eration by eliminating dependent state variables with the latest dependency 
functions. Combinational dependency is then obtained for the new set of tran- 
sition functions. Unlike the gfp iterations, the obtained functional dependency 
in the ith iteration may not be an invariant for the following iterations because 
the derived dependency may be valid only in the state subspace spanned by 
As the state subspace changes over the iterations due to different selec- 
tions of independent state variables, the dependency may need to be rectified. 
Notice that the set of functional independents may not increase monotonically 
during the iterations. This non-convergent phenomenon is due to the existence 
of the don’t-care choices of 0*-*^ in addition to the imperfect information about 
the currently reachable state set. Therefore, additional requirements need to be 
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SequentialDependencyLfp 

input: a state transition system M = (/, r, s, S, X) 
output: a dependency triplet {5^, 8^,0) for 8 

begin 

01 i ~ 0; 0*-°^) := InitialDependency{I) 

02 repeat 

03 i := i + 1 

04 := 8{r, 

05 CombinationalDependencyReuse{8^'\0^^~^^) 

06 until 6»0) = 6>0-i) 

07 return 00)) 

end 

Fig. 3. Algorithm: SequentialDependencyLfp. 



imposed to guarantee termination. Here we request that, after a certain number 
of iterations, the set of independent state variables increase monotonically until 
00) can be reused in the next iteration, that is, the fixed point is reached. The 
algorithm is outlined in Figure 3. To simplify the presentation, it contains only 
the iterations where {s^^} increases monotonically. Procedure CombinationalDe- 
pendencyReuse is the same as CombinationalDependency except that it tries to 
maximally reuse the dependency functions provided in its second argument. 

In theory, the optimality of the Ifp calculation lies somewhere between that of 
the gfp calculation and that of the most general computation with reachability 
analysis. Since not all dependency in A4 can be detected by the Ifp procedure 
due to the imperfect information about the reachable states, the algorithm is 
incomplete in detecting dependency. To make it complete, reachability analysis 
should be incorporated. We postpone this integration to the next section and 
phrase it in the context of verification reduction. 

Remark 2. Notice that when 0(*^’s are restricted to consisting of only identity 
functions and/or complementary identity ones, refinement relation C among 
transition functions reduces to an equivalence relation; the Ifp calculation of 
sequential dependency reduces to the detection of equivalent state variables. 
Hence, detecting signal correspondence [9] is a special case of our formulation. 

4 Verification Reduction 

Here we focus on the reduction for safety property verification, where reacha- 
bility analysis is the core computation. The verification problem asks if a state 
transition system M = {I,r,s,S,X) satisfies a safety property P, denoted as 
M\= P, for all of its reachable states. 

Suppose that is a dependency triplet of S; let and be 

the corresponding state variables of and <5,^, respectively. To represent the 
reachable state set, either s or can be selected as the basis. Essentially, 
R{s) = Expand{R-'-{s^),{S(i,,s^,6)) = R^{s^) A where R 
and R^ are the characteristic functions representing the reachable state sets in 
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ComputeReachWithDependency Reduction 

input: a state transition system Ml = (/, r, s, S, X) 
output: the set R of reachable states of A4 

begin 

Of i ~ 0; {s^^\ := InitialDependency{I) 

02 := /[{4°>}/{s}] 

03 R^° := 7-^0; := I^° 

04 repeat 

05 i :=i + l 

06 := S{r, 4 "^^)) 

07 0^*^) := CombinationalDependencyReach{S^^\d^^~^\ 

08 := 

09 Si/ := di, := Si/’s corresponding functions in 

10 R-'-*-^ := Expand{R^^-^ 

11 := 77^-1 [{s«}/{s« U 

12 F^' := simplify T^' with 77^*“^ as don’t care 

13 77-^* := 77-^*-i UT-^* 

14 until 77-^* = 77^*”i 

15 return Expand {R-^ *, {s^^\ s^\ 6^'^'^)) 
end 



Fig. 4. Algorithm: ComputeReachWithDependencyReduction. 



the total space and, respectively, in the reduced space spanned by s^. Let P{s) 
denote the states that satisfy P. Checking whether 77(a) ^ 7^(a) is equivalent to 
checking whether 77^(a,^) ^ P^{sip), where P^(s^) = P{{6{s^); s^)). Hence, 
the verification problem can be carried out solely over the reduced space. As 
noted in Remark 1, the set 7 of initial states might require special handling. 

For given dependency, reachability analysis can be carried out solely upon 
the reduced basis. The validity of the given dependency can be tested in every 
iteration of the reachability analysis as was done in [11]. Below we concentrate on 
the cases where dependency is not given. We show how the detection of functional 
dependency can be embedded into and simplify the reachability analysis. 

To analyze the reachability of a transition system with unknown dependency, 
two approaches can be taken. One is to find the sequential dependency with the 
forementioned gfp and/or Ifp calculation, and then perform reachability analysis 
on the reduced state space based on the obtained dependency. The other is to 
embed the dependency detection into the reachability analysis as an on-the-fly 
reduction. Since the former is straightforward, we only detail the latter. Figure 4 
sketches the algorithm. Procedure CombinationalDependency Reach is similar to 
CombinationalDependency Reuse with two exceptions: First, the derived depen- 
dency is with respect to the reached state set provided in the third argument. 
Second, the set of independent state variables needs not increase monotonically 
since the termination condition has been taken care of by the reached state sets. 
In each iteration of the state traversal, the previously reached state set 7? is 
adjusted (by the expansion and projection operations) to a new basis according 
to the derived dependency triplet. 
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5 Experimental Results 

The forementioned algorithms have been implemented in the VIS [4] environ- 
ment. Experiments were conducted on a Sun machine with a 900-MHz CPU 
and 2-Gb memory. Three sets of experiments have results shown in Tables 1, 2, 
and 3, respectively. Table 1 demonstrates the relative power of exploiting depen- 
dency by the detection of signal correspondence, the gfp, and Ifp calculations 
of sequential dependency. Table 2 compares their applicabilities in the equiv- 
alence checking problem. Finally, Table 3 shows how reachability analysis can 
benefit from our computation of functional dependency. In the experiments, all 
the approaches under comparison use the same BDD ordering. In addition, no 
reordering is invoked. 



Table 1. Comparisons of Capabilities of Discovering Dependency. 



Circuit 


State Var. 


Signal Corr. 


9| 


Seq. 


Dep. Gfp 




Seq. 


Dep. Lfp 


indp. 


iter. 


mem. 

(Mb) 


time 

(sec) 


indp. 


iter. 


mem. 

(Mb) 


time 

(sec) 


indp. 


iter. 


mem. 

(Mb) 


time 

(sec) 


s298-rt 


34 (14) 


31 


5 


10 


0.3 


23 


2 


23 


1.6 


24 


10 


41 


6.2 


s499-rt 


41 (22) 


41 


21 


13 


1.6 


29 


1 


23 


11.6 


29 


22 


23 


8.2 


s510-rt 


34 (6) 


32 


4 


13 


0.4 


21 


2 


51 


17.5 


23 


6 


58 


81.1 


s526n-rt 


64 (21) 


55 


4 


13 


1.0 


37 


2 


60 


104.2 


40 


14 


58 


26.8 


s635-rt 


51 (32) 


50 


16 


13 


0.6 


34 


2 


13 


2.8 


34 


33 


21 


7.4 


s838 . 1-rt 


73 (32) 


48 


20 


13 


1.5 


33 


1 


22 


3.7 


33 


46 


21 


18.3 


s991-rt 


42 (19) 


24 


2 


13 


0.5 


21 


2 


21 


1.4 


20 


2 


21 


1.4 


multl6a-rt 


106 (16) 


66 


6 


13 


0.9 


75 


2 


13 


1.0 


61 


8 


13 


4.6 


tbk-rt 


49 (5) 


49 


2 


49 


6.8 


13 


4 


62 


264.1 


21 


3 


59 


48.4 


S3271 


116 


114 


6 


29 


2.1 


116 


0 


29 


3.0 


114 


6 


45 


12.6 


S4863 


104 


81 


3 


47 


4.7 


81 


1 


69 


178.7 


75 


3 


47 


14.5 


S5378 


179 


163 


12 


37 


6.5 


155 


2 


51 


15.9 


154 


14 


51 


43.1 


S9234.1 


211 


188 


18 


99 


79.5 


189 


2 


97 


250.2 


184 


38 


99 


967.6 


S13207 


669 


303 


16 


138 


95.6 


460 


5 


111 


384.6 


263 


37 


100 


836.0 


S15850 


597 


431 


24 


142 


221.7 


569 


3 


134 


1487.1 


315 


32 


142 


1441.0 


S35932 


1728 


1472 


31 


281 


599.8 


1728 


0 


146 


34091.5 


- 


- 


- 


> 10^ 


S38584 


1452 


869 


17 


303 


525.5 


1440 


1 


155 


4103.3 


849 


25 


303 


22001.1 


8085 


193 


91 


15 


65 


28.9 


193 


0 


70 


42.4 


79 


17 


63 


64.3 



Compared in Table 1 are three approaches: the computation of signal corre- 
spondence [9], the gfp, and Ifp calculations of sequential dependency. The first 
two columns list the benchmark circuits and their sizes in state variables. The 
original sizes of retimed circuits (for timing optimization) are listed in the follow- 
ing parentheses. For each compared approach, four columns in order list the sizes 
of the computed independent state variables, the required numbers of iterations, 
memory usage, and CPU time. Among these three approaches, the minimum 
sizes of independent variables are highlighted in bold. It is evident from Table 1 
that the Ifp calculation of sequential dependency subsumes the detection of sig- 
nal correspondence in both generality and optimality. On the other hand, the 
powers of the Ifp and gfp calculations are incomparable in practice. They have 
different directions of approximating reachable state sets. For the gfp calculation, 
the unreachable state set is gradually pruned each time dependency functions 
are substituted backward. For the Ifp one, the reachable state set grows with the 
iterative computation. It turns out that the gfp computation is very effective 
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in exploiting dependency for retimed circuits. For instance, in circuit tbk-rt, 
13 variables are identified as independents by the gfp calculation, compared to 
24 by the Ifp one. In general, the gfp computation uses much fewer iterations 
than the other two approaches. In contrast, the Ifp calculation outperforms the 
other two approaches in circuits not retimed. The table also reveals that all the 
approaches do not suffer from memory explosion. Rather, the time consumption 
may be a concern in the gfp and Ifp calculations of sequential dependency. This 
is understandable because testing the refinement relation is more general and 
complicated than testing the equivalence relation used in the detection of signal 
correspondence. Fortunately, the tradeoff between quality and time can be easily 
controlled, for example, by imposing fc-substitutability, which uses up to k func- 
tions to substitute a dependent function. With our formulation, dependencies 
that were underivable before, due to the limitation of reachability analysis on 
large transition systems, can now be computed efficiently. 



Table 2. Comparisons of Capabilities of Checking Equivalence. 



Circuit 


State Var. 


Signal Corr. 


9| 


Seq. Dep. Gfp 


Seq. Dep. Lfp 


indp. 


iter. 


mem. 

(Mb) 


time 

(sec) 


indp. 


iter. 


mem. 

(Mb) 


time 

(sec) 


indp. 


iter. 


mem. 

(Mb) 


time 

(sec) 


s298 


14+34 


39 


5 


10 


0.5 


37 


2 


21 


1.5 


30 


13 


31 


4.4 


s499 


22+41 


63 


21 


14 


3.1 


43 


2 


38 


7.3 


42 


22 


45 


23.6 


s510 


6-1-34 


38 


4 


13 


0.6 


27 


2 


50 


25.9 


29 


5 


36 


39.8 


s526n 


21-1-64 


69 


8 


13 


2.4 


58 


2 


59 


121.9 


50 


12 


58 


31.8 


s635 


32+51 


66 


31 


13 


7.8 


66 


1 


21 


1.4 


51 


33 


25 


9.1 


S838.1 


32-1-73 


78 


31 


25 


16.8 


65 


2 


48 


4.2 


59 


47 


37 


22.5 


s991 


19+42 


42 


2 


22 


1.5 


40 


2 


38 


2.5 


39 


3 


41 


5.4 


multl6a 


16+106 


82 


6 


14 


4.6 


91 


2 


14 


1.7 


77 


8 


26 


5.1 


tbk 


5-1-49 


54 


2 


44 


5.5 


17 


4 


61 


175.6 


25 


3 


59 


86.4 



With similar layout to Table 1, Table 2 compares the applicabilities of these 
three approaches to the equivalence checking problem. Here a product machine 
is built upon a circuit and its retimed version. As noted earlier, the gfp calcu- 
lation itself cannot prove the equivalence between two systems. It, essentially, 
computes the dependency inside each individual system, but not the interdepen- 
dency between them. On the other hand, the detection of signal correspondence 
can rarely prove equivalence unless the two systems under comparison are almost 
functionally identical. In contrast, the Ifp calculation of sequential dependency 
can easily prove the equivalence between two systems where one is forwardly 
retimed from the other, and vice versa. Arbitrary retiming, however, may cause 
a failure, although in principle there always exists a Ifp calculation that can con- 
clude the equivalence. In Table 2, since the retiming operations on the retimed 
circuits involve both forward and backward moves, none of the approaches can 
directly conclude the equivalences. However, as can be seen, the Ifp calculation 
can compactly condense the product machines. 

Although detecting dependency can reduce state space, it is not clear if the 
HDD sizes for the dependency functions and the rewritten transition functions 
are small enough to benefit reachability analysis. In Table 3, we justify that 
it indeed can improve the analysis. Some hard instances for state traversal are 
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Table 3. Comparisons of Capabilities of Analyzing Reachability. 



Circuit 


Iter. 


R.A. 


w/o Dep. Reduction 


R.A. 


w Dep. Reduction 




peak 

(bdd nodes) 


reached 
(bdd nodes) 


mem. 

(Mb) 


time 

(sec) 


peak 

(bdd nodes) 


reached 
(bdd nodes) 


mem. 

(Mb) 


time 

(sec) 


S3271 


4 


28819301 


16158242 


620 


2784.1 


18843837 


10746053 


415 


1082.6 


S4863 


2 


18527781 


248885 


365 


404.8 


549006 


8772 


67 


13.1 


S5378 


2 


- 


- 


> 2000 


- 


1151439 


113522 


70 


21.5 


S15850 


15 


29842889 


9961945 


653 


21337.4 


17667076 


6356714 


463 


8175.0 


8085 


50 


16663749 


1701604 


390 


24280.2 


7830602 


1338322 


212 


4640.1 



studied. We compare reachability analyses without and with on-the-fly reduc- 
tion using functional dependency. In the comparison, both analyses have the 
same implementation except switching off and on the reduction option. The sec- 
ond column of Table 3 shows the steps for (partial) state traversal. For each 
reachability analysis, four columns in order shows the peak number of live BDD 
nodes, the size of the BDD representing the final reached state set, memory us- 
age, and CPU time. It is apparent that, with the help of functional dependency, 
the reachability analysis yields substantial savings in both memory and time 
consumptions, compared to the analysis without reduction. 

6 Comparisons with Related Work 

Among previous studies [11,8] on exploiting functional dependency, the one 
closest to ours is [8] while functional dependency in [11] is assumed to be given. 
The method proposed in [8] is similar to our reachability analysis with on-the- 
fiy reduction. However, several differences need to be addressed. First, previous 
dependency was drawn entirely from the currently reached state set (using func- 
tional deduction) rather than from the transition functions. Thus, in each itera- 
tion of their reachability analysis, image computation need to be done before the 
detection of new functional dependency. The image computation rarely benefits 
from functional dependency. In contrast, our approach is more effective because 
the dependency is discovered before the image computation, which is performed 
on the reduced basis. Second, as previous dependency was obtained from a cur- 
rently reached state set, not from transition functions, it is not as robust as ours 
to remain valid through the following iterations. Third, the prior method cannot 
compute functional dependency without reachability analysis while our formu- 
lation can be used as a stand-alone technique. Also, we identify a new initial set 
of non-dangling states. It uncovers more dependency to be exploited. 

For related work specific to sequential equivalence checking, we mention [16, 
1, 18,9]. Among them, the work of [9] is the most relevant to ours; it is a special 
case of our Ifp calculation as noted in Remark 2. While these prior efforts focus 
on equivalence checking, ours is more general for safety property checking. 

7 Conclusions and Future Work 

We formulate the dependency among a collection of functions based on a refine- 
ment relation. When applied to state transition systems, it allows the detection of 
functional dependency without knowing reached state sets. With an integration 





280 Jie-Hong R. Jiang and Robert K. Brayton 



into a reachability analysis, it can be used as a complete verification procedure 
with the power of on-the-fly reduction. Our formulation unifies the work of [9] 
and [8] in the verification framework. In application to the equivalence checking 
problem, our method bridges the complexity gap between combinational and se- 
quential equivalence checking. Preliminary experiments show promising results 
in detecting dependency and verification reduction. 

As a future research direction, our results might be reformulated in a SAT- 
solving framework. A path similar to that of [3], where van Eijk’s algorithm was 
adjusted, could be taken to prove safety properties by strengthened induction. 
Because our approach may impose more invariants than just signal correspon- 
dence, we believe that SAT-based verification can benefit from our results. 
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Abstract. This paper shows how to harness decision procedures to automatically 
verify safety properties of imperative programs that perform dynamic storage al- 
location and destructive updating of structure fields. Decidable logics that can 
express reachability properties are used to state properties of linked data struc- 
tures, while guaranteeing that the verification method always terminates. The 
main technical contribution is a method of structure simulation in which a set of 
original structures that we wish to model, e.g., doubly linked lists, nested linked 
lists, binary trees, etc., are mapped to a set of tractable structures that can be rea- 
soned about using decidable logics. Decidable logics that can express reachability 
are rather limited in the data structures that they can directly model. For instance, 
our examples use the logic MSO-E, which can only model function graphs; how- 
ever, the simulation technique provides an indirect way to model additional data 
structures. 



1 Introduction 

In this paper, we explore the extent to which decidable logics can help us check prop- 
erties of programs that perform dynamic storage allocation and destructive updating 
of structure fields. One of our key contributions is a method of structure simulation in 
which a set of original structures that we wish to model, e.g., doubly linked lists, nested 
linked lists, binary trees, etc., are mapped to a set of tractable structures that can be 
reasoned about using decidable logics. 

1.1 Motivation 

Automatically proving safety and liveness properties of sequential and concurrent pro- 
grams that permit dynamic storage allocation and low-level pointer manipulations is 
challenging. Dynamic allocation causes the state space to be infinite. Also, abstract- 
datatype operations are implemented using loops, procedure calls, and sequences of 
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low-level pointer manipulations; consequently, it is hard to prove that a data- structure 
invariant is reestablished once a sequence of operations is finished [7]. 

Reachability is crucial for reasoning about linked data structures. For example, to 
verify that a memory configuration contains no garbage elements, we must show that 
every element is reachable from some program variable. Specifying such properties 
presents a challenge because it requires a logic with the ability to quantify over un- 
bounded resources and to express reachability properties. Even simple decidable frag- 
ments of first-order logic become undecidable when reachability is added [9]. The 
reader may wonder how undecidable logics can be useful for automatic verification. 

1.2 An Overview of the Approach 

In this section, we illustrate the simulation technique by showing its applicability to 
semi-automatic Hoare-style verification. The technique can also be applied to improve 
the precision of operations used in abstract interpretation [14]. 



Hoare-Style Verification: Recall that in Hoare-style verification, a programmer ex- 
presses partial-correctness requirements of the form {pre}st{post}, where pre and 
post are logical formulas that express the pre- and post-condition of statement st. To 
handle loops, it is necessary that loop invariants be provided (also specified as logi- 
cal formulas). From these annotations, a formula (p is generated, called the verification 
condition of the program; (p is valid if and only if the program is partially correct. 

In this paper, we allow pre-conditions, post-conditions, and loop invariants to be 
specified in FO(TC): first-order formulas with transitive closure. The generated verifi- 
cation condition is also an FO(TC) formula. The logic FO(TC) is natural because it 
can express pointer dereferencing and dynamic creation of objects and threads. How- 
ever, validity in this logic is undecidable, and therefore the validity of a program’s 
verification condition cannot be checked directly. 



Restricting the Set of Reachable Program States: To perform verification automati- 
cally, we will restrict the set of program states sufficiently to make the validity problem 
for FO(TC) decidable. More precisely, for every program and class of data structures 
of interest, we choose a designated set of structures C, such that the problem of checking 
whether a general FO(TC) formula holds for all structures in C is decidable. In other 
words, we work with FO(TC) formulas, for which the validity problem is undecidable 
in general, but by restricting our attention to the class of structures in C, we change the 
problem into one that is decidable. 



Simulation Invariants: Because C is defined per class of data structures and per pro- 
gram, we cannot assume that the set C is defined a priori. Instead, we assume that the 
class C is also definable by an FO(TC) formula, i.e., there exists a formula SI such 
that .4 G C iff .4 ^ SI. SI is called the simulation invariant, and is similar to a data- 
structure invariant or a class invariant. This formula can usually be defined once the 
data structure that we wish to model is known. 
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From a logical perspective, restricting attention to structures satisfying SI allows us 
to check if an FO(TC) property (p holds in all structures in C by showing the validity 
of 

SI (1) 



First-Order Reductions: Instead of implementing a decision procedure that is para- 
metric in SI for validity of formulas of the form Eq. (1), we provide a way to translate 
these formulas automatically into another decidable logic, with an existing decision 
procedure. We define admissibility conditions that guarantee that an FO(TC) formula 
of the form Eq. (1) is valid if and only if the corresponding formula in the decidable 
logic is valid. This allows us to use existing decision procedures to determine the valid- 
ity of verification conditions. In the paper, we illustrate the method using the decidable 
logic MSO-E, i.e., weak monadic second-order logic on function graphs. FO(TC) for- 
mulas over a restricted set of graphs - but including possibly cyclic graphs, and graphs 
containing nodes that have in-degree greater than one - are translated into MSO-E for- 
mulas on function graphs. This opens up the possibility to apply MSO-E to a larger set 
of graphs. 

Handling Low-Level Mutations: Destructive updates of pointer- valued structure 
fields might cause the simulation invariant to be violated. Therefore, the verification 
condition must also check whether the simulation invariant is maintained. For example, 
we change the requirement for a statement from {pre}st{post} to {pre A SI}st{post A 
SI}. This guarantees that the method will never miss an error: it will either detect an 
error in the specification, or a violation of the simulation invariant by some operation. 

Even correct programs can temporarily violate a data- structure invariant. That is, 
a violation of the data-structure invariant need not indicate an error. Eor example, it is 
possible to rotate an acyclic singly-linked list by first creating a cycle, and later breaking 
the cycle. This complicates the task of defining simulation invariants. In particular, the 
user needs to allow simulation invariant SI to be less restrictive than the data-structure 
invariant for the data-structure of interest. Still, we believe that this approach can handle 
realistic programs that manipulate their data structures in a limited way, e.g., when the 
number of mutated pointer fields is limited. The programmer can specify an SI that 
allows sufficient flexibility, and assert that the data-structure invariant is reestablished 
at certain (“stable”) states. 

1.3 Main Results 

The contributions of this paper can be summarized as follows: 

- We state the precise requirements that allow the method to be applied to verify 
properties of imperative programs written in a programming language with pointers 
and destructive updates (Section 2.2 ; for proofs see the full version of the paper). 

- We show that programs that manipulate commonly used data structures, such as 
singly-linked (including cyclic and shared) lists, doubly-linked lists, and trees with 
unbounded branching, satisfy the above requirements (Section 2.3). (We note that 
for structures that are already tractable, such as singly linked lists, simple trees, etc., 
our methods work trivially.) 
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- In Section 3.1, we show how to use our method for Hoare-style verification. We 
assume that the programmer specifies loop invariants and pre- and post- conditions 
for procedures. 

- In Section 3.2, we show how to use our method for automatic verification using 
abstract interpretation. This eliminates the need to provide loop invariants, but may 
only establish weaker safety properties than those proved by Hoare-style verifi- 
cation. This fills in the missing piece of the algorithm presented in [14], which 
requires a decidable logic to compute the best abstract transformer automatically. 
This allows us to apply the algorithm from [14] to imperative programs that per- 
form destructive updates. We have implemented this using TVLA and MONA. 



2 Simulation 

In this section, we define the notation used, and describe the simulation methodology 
in detail. 

2.1 Notation 

For any vocabulary r, let £(r) denote a logic with vocabulary t. In our examples, 
£(t) is first-order logic with transitive closure. We allow arbitrary uses of a binary 
transitive closure operator, TC; TGu, u'Vp] denotes the reflexive, transitive closure of 
binary relation (p{u,u') [8]. We omit the subscript from TC„ „/ whenever it is clear 
from the context. In principle, however, we could use other logics, such as second- 
order logic. Let STRUCT[r] denote the set of finite structures over vocabulary r. We 
denote the set of structures that satisfy a closed formula (p by |(^] C STRUCT[r]. 

Let Org be a class of original structures. Let Torg = ■ • ■ R‘k‘} be the vo- 

cabulary of Org, where the arity of the relation symbol Ri is a^. Note that Torg may 
be arbitrarily rich. Org is a subset of STRUCT[rorg]- Not all structures of Org can be 
simulated. We assume that the user provides a simulation invariant, SI, that defines the 
subclass \[SI ]] of Org to be simulated. SI is a global invariant that must hold throughout 
the program. 

Example 1. Let Org be the set of forests of binary trees, represented as logical struc- 
tures over Torg = where the two binary relations I and r denote pointers to the 

left and right subtree. SI, shown in the third column of Table 1, specifies that I and r 
are partial functions, and forbids structures with shared nodes or cycles created by I and 
r pointers. It is clear that all forests of binary trees satisfy this requirement. A structure 
in Torg satisfies SI if and only if it represents a forest of binary trees. 

We use this example of binary trees throughout the section to demonstrate our 
method. We use the notation TREE{1, r} (and other similar notation in Section 2.3), 
in the same manner as a data- type in a programming language, and not a specific object 
of the tree data-type. 

Let Rep be the set of representation structures, and let I? be a decidable logic over 
Rep. That is, we have an algorithm that halts on all inputs and tells us for every formula 
tp G T> whether ip holds for all structures in Rep. 
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In our examples, Rep will always be a set of (finite) function graphs, i.e., every 
vertex has at most one edge leaving it. Let TRep denote the vocabulary of Rep. We 
illustrate our method by fixing T> to be MSO-E, which is weak monadic second-order 
logic on function graphs. The vocabulary TRep includes one binary relation symbol, 
E, which must be interpreted by a relation that is a partial function. Arbitrary unary 
relation symbols and constant symbols may be included. Reachability is expressible 
in MSO-E using second-order quantification. MSO-E is decidable by reduction to the 
monadic second-order theory of one unary function [ 11 ]. 

In addition to the simulation invariant, SI, we assume that the user provides a rep- 
resentation invariant, RI, that defines the subclass |i?/] of Rep. 

2.2 Simulation Methodology 

We define a first-order mapping, t] : STRUCT[rflep] — > STRUCT[Torg]- 

Definition 1. (First-order mapping rj) Let A £ STRUCT[TRep] be a structure with 
universe |^|. For each relation symbol Ri £ Torg of arity Ui, rj provides a formula 
. . - Vai) £ C{TRep). The structure rj{A) has the same universe as A^. The 
relation symbol Ri is interpreted in rj{A) as follows: 

Rf^'^ = {{e^,...,ea,)£\Ar \ A^5[Ri\{e^,...,ea,)] . 

Example 2. The second column of Table 1 lists the 5 formulas for the main original 
data structures that we consider. The representation of the TREE {I, r} example uses 
TRep = {E^, L^,R^}. For example, U 2 ) is the formula E{v 2 ,vi) A L{v 2 ). In- 

tuitively, the simulation reverses the I- and r-edges (recording them as i?-edges), and 
marks each node that was a target of an Ledge or r-edge with L or R, respectively. 

The first-order mapping on structures, 77 , defines a dual, first-order translation on for- 
mulas in the opposite direction, fj : C{rorg) E{TRep). 

Definition 2. (First-order translation rj) For any formula p £ C{Torg), rj{(p) £ 
C{TRep) is the result of replacing each occurrence in of Ri{t\, . . . ,taf), by the for- 
mula S[Ri]{ti, . . . ,taf). 

It follows immediately that for all A £ STRUCT[r/{ep] and p £ C{Torg), 

A h rj{ip) 4A r]{A) h V? • (2) 

Definition 2 and Eq. (2) follow from [ 8 , Proposition 3.5] which can be easily extended 
to handle the transitive-closure operator allowed in £(rorg)- 

See Figure 1 for a sketch of the mapping rj from Rep to Org and the corresponding 
dual rj from C{Torg) T{TRep). To guarantee that queries over the original structures 
can be precisely answered in the representation, the following restrictions are imposed 
on 77, SI, and RI. 

* For simplicity, we make this same-universe assumption in this paper. Actually first-order map- 
pings may change the size of the universe by any polynomial function [8]. 
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Fig. 1. SI defines a class of structures that can be mapped to Rep. 



Definition 3. (Admissibility requirement) Given a first-order mapping 
rj\ STRUCT[TRep\ — *■ STRUCT[Torg\, a simulation invariant SI such that 15/] C Org, 
and a representation invariant RI such that |i?/| C Rep, we say that ( 77 , SI , RI) is 
admissible if the following properties hold: 

Well-Definedeness of 77 . For all Sr € Rep such that Sr |= RI, t]{Sr) H 
Ontoness of 77 . For all So G Org such that So 1= SI, there exists a structure Sr G Rep 
such that Sr \= RI and t]{Sr) = So- 

In fact, for all examples shown in Table 1, we do not provide RI explicitly, but 
define it to be rj{SI). The following lemma provides a sufficient condition for admissi- 
bility of ( 77 , 5/, 77 ( 5 /)). 

Lemma 1. If for all So € Org such that So |= SI, there exists Sr € Rep such that 
r]{Sr) = So, then ( 77 , SI,rj{SI)) is admissible. 

Admissibility guarantees that each query in the original language is translated by rj to 
a query in the representation language, where it can be answered using the decision 
procedure: 

Theorem 1. (Simulation Theorem) Assume that {rj, SI, RI) defines an admissible 
simulation. Then for any p G C(rorg), SI ^ p holds in Org iff RI ^ holds in 
Rep. 

Corollary 1. Assume that (i) ( 77 , SI , RI) defines an admissible simulation, (ii) for all 
ip G C{Torg), RI ^ tiiyp) G T>, and (Hi) T> is decidable over Rep. Then there exists an 
algorithm that for any p G C{Torg), checks whether SI ^ ip holds in Org. 

A simpler way to express condition (i) is with the equation, |5/| = 77(|i?/|). We 
show that condition (i) holds for all examples in Section 2.3. From Definition 2 and 
Eq. (2), it immediately follows that rj{C{Torg)) I^ifRep), which in turn is contained 
in monadic second-order logic over one binary relation E (not necessarily interpreted 
as a partial function). Because in all of our examples RI ensures that E is a partial 
function, RI rj{C{Torg)) is in MSO-E. Thus, condition (ii) holds for all of the 
examples that we consider. This allows us to use a decision procedure for MSO-E in our 
examples, to check validity of FO(TC) formulas over |5/|. Instead of implementing a 
decision procedure for MSO-E, we have implemented a translation of MSO-E formulas 
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into WS2S (using the simulation method, as described in the full paper), and used an 
existing decision procedure MONA [5] for WS2S logic. 

Remark. If the decidable logic T> places a syntactic restriction on formulas, as is the 
case for 3V(DTC[£^]) [9], then condition (ii) of Corollary 1 may not be satisfied. In 
such cases, it may be possible to replace rj (Definition 2) with an algorithm (or a set of 
heuristics) that generates a formula ip G T> that is equivalent to RI rj{ip) over Rep. 
From our experience, the main difficulty is to ensure that ^RI is in T>. This problem can 
be addressed by choosing a stronger representation invariant RI with |i?/] C |^(S'/)], 
such that RI G T>, and a corresponding simulation invariant SI' (stronger than the orig- 
inal SI), and proving admissibility directly from Definition 3, rather than by Lemma 1. 

2.3 Simulating Commonly Used Data Structures 

We distinguish between families of data structures according to the field-pointers in use 
and their intended meaning. Table 1 shows the simulation-specific definitions required 
for each family of data structures: (i) the translation rj-, and (ii) the invariant SI. 

Table 1. Commonly used data structures and their simulations. The formula /tmc[/] for some 
field-pointer / is defined by Vv,vi,V2 : f(v,vi) A f(v,V2) => i>i = V2\ it ensures that / is 
interpreted as a partial function; the formula U2)] for some formula tp is defined by 

Vw, vi,V2 : ‘ip{vi,v) A i/>(u2, v) ^ vi = V2', the formula acyc[tp{vi,V2)] for some formula ip 
is defined by Svi,V2 : ip{vi,V2) => ~^TC[tp]{v2,vi). The formula unique[p] for some unary 
predicate p is defined by Vi>i , W2 : p(vi) A p(v2) ^ vi = V2. 



Data Structure 


Translation (77) 


Simulation Invariant (SI) 


SLL{n} 


n{vi,V 2 ) E{vi,V 2 ) 


func[n] 


TREE{l,r} 


l{vi,V2) 1-^ E{v2,Vi) a L{v 2) 
r(vi,V2) 1-^ E(v2,Vi) a R(v2) 


func[l\ A func[r\ 

A uns[l{vi,V2) V r(ui, n2)] 
A acyc\l(vi,V2) V r(m, n2)l 


UBTREE{.s} 


s(vi,V2) ^ E(v2,Vi) 


Mns[s] A acyc[s] 


DLL{f,b} 






DLL{f,b} b 
points to one of 
{pi,...,Pk} 


f{vi,V2) E{vi,V2) 

b{vi,V2) f{v2,vi) A B{vi) 

vVi=i„..,fcPi(«2)ABp,(ni) 


\/v\^V 2 : '^^2) ^ 

(/(U2,ni) V 

Afunc[f] A func[b] A uns[f] 
Auns[b] A Ai-i k unique[pi] 


DLL{f,b} b 
defined by 

‘fi(vi,V 2 ) 


f{vi,V2) E{vi,V2) 

b{vi,V2) f{v2,vi) A B{vi) 

\/B,p{vi) A (fi{vi,V2) 


Vvi,V2 : b(vi,V2) 

(f(v2,vi) V ifi(vi,V2)) A funclf] 
Afunc[b] A uns[f] A uns[b] 



The vocabulary rorg contains binary predicates that represent field-pointers and 
unary predicates that represent pointer variables and properties of elements. The names 
of binary predicates are specific to each example; unary predicates are usually named 
X, y, Pi. A formula in T> over the vocabulary TRep can use all unary predicates from 
Torg and additional simulation- specific unary predicates. 

SLL{n}. A singly linked list, where n denotes the field-pointer to the next element 
in the list. Rep is the set of all function graphs over TRep- The translation rj and the 
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mapping 77 are identity functions. Because n represents a field-pointer that can point to 
at most one memory location, SI requires that n be interpreted as a partial function. 
This simulation invariant allows us to handle any singly-linked list, including cyclic 
and/or shared lists. 

TREEjl, r}. The standard tree data- structure was explained earlier. 

UBTREEjs}. A tree with unbounded branching, where s(wi , W 2 ) is a successor rela- 
tion: V 2 is a successor of vi in the tree. Rep is the set of acyclic function graphs over 
T^ep- The simulation simply reverses the edges. 

DLL{f,b}. A doubly linked list, where / and b denote the forward and backward 
field-pointers. At this point, we refer only to the first simulation of DLL in Table 1 ; the 
purpose of other simulations is explained in Section 3.4. Tuep includes one simulation- 
specific unary predicate B. |i//] is the set of all function graphs over TRep such that a 
graph node labeled with B must not be shared i.e., it may have at most one incoming 
E-edge. 

The simulation invariant ensures that the backward pointer can only point to its /- 
predecessor (if an element does not have an /-predecessor, its backward pointer must 
be NULL). The translation represents the binary relation h{vi , V 2 ) with a unary predicate 
B{vi), which denotes the presence (or absence) of the backward pointer from element 

Vi. 

The full version of the paper contains more complex simulations of generalized 
trees and undirected graphs. 

3 Applications 

In this section, we provide motivation for potential applications. 



Meaning of Program Statements. We assume that the meaning of every atomic pro- 
gram statement st is expressed as a formula transformer, wpst '■ — > E(r), which 

expresses the weakest precondition required for st to produce structures that satisfy a 
given formula; i.e., for every p G C(r), the input structure before st satisfies wpst{p) 
if and only if the resultant structure produced by st satisfies ip. 

3.1 Hoare-Style Verification 

In Hoare-style verification, the programmer expresses partial-correctness requirements 
of the form {pre}st{post}, where pre and post are logical formulas in £(r) that ex- 
press the pre- and post-condition of a statement st. To handle loops, it is required that 
loop invariants be provided (also specified as logical formulas in C{t)). 

In conventional Hoare-style verification, the partial correctness of a statement st is 
ensured by the validity of the formula pre wpst {post ) . In our approach, the partial 

correctness of st is ensured by the validity of the formula SI A pre wpst {post A SI). 
The presence of SI on the left-hand side corresponds to an assumption that we are only 
interested in states in which SI holds; the presence of SI on the right-hand side - within 
the operator wpst - ensures that the execution of st yields a state in which SI again 
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holds. (This means that the verification system will report an error if execution of st can 
cause either post to be violated or the global invariant SI to be violated.) The partial 
correctness of sequential composition, conditional statements, and loop invariants is 
expressed by similar extensions to the standard Hoare-style approach. 

3.2 Abstract Interpretation 

The abstract-interpretation technique [2] allows conservative automatic verification of 
partial correctness to be conducted by identifying sound over-approximations to loop 
invariants. An iterative computation is carried out to determine an appropriate abstract 
value for each program point. The result at each program point is an abstract value that 
summarizes the sets of reachable concrete states at that point. 

In abstract interpretation we usually assume that the set of potential abstract values 
forms a lattice A. Also, concrete states can be represented as logical structures over 
vocabulary r. We introduce a non-standard requirement that all concrete states manipu- 
lated by a correct program satisfy SI. Thus, the abstract-interpretation system needs to 
issue a warning if this requirement is violated. We formalize this restriction of concrete 
states to l^/J as follows: 

A concretization function 7 : A ^ yields the set of concrete states that an 

abstract element represents. Similarly, the abstraction function a : 2 l^^I ^ A yields the 
abstract value that represents a set of concrete states. The partial order on A (denoted 
by C) satisfies a C a' 7 (a) C 7 ( 0 '). 

An element a G A is a lower bound of a set AT C A if, for every x € X, a Q x. The 
lattice is closed under a meet operator, denoted by n, which yields the greatest lower 
bound with respect to C; i.e., for every set X C A, FIX is a lower bound of X, and for 
every lower bound a of X, a C FIX. The concretization and the abstraction functions 
form a Galois connection between and A; i.e., for all a G A and X C 15 / 1 , 
a{X) C a X C 7(a). An additional assumption that we place on A is that the 

concretization can be expressed in £(r); i.e., for every a G A, there exists a formula in 
£(t), denoted by 7(a), that exactly represents a: s G |7(o)] if and only if s G 7(a). 
The tightest over- approximation in A of the set of concrete states specified by a formula 
ip (denoted by a{p)) can be computed by 

«(V 5 ) = n{a|(5/ A ^ 7 (a)}. (3) 

The following lemma states that a is indeed the tightest over-approximation: 

Lemma 2. For every formula ip, a(p) = o;(|(/?]). 

Now consider a statement st and an abstract value a that denotes the set of states before 
st. First, as in Section 3.1, we need to ensure that the global invariant SI is maintained 
by checking the validity of 



(SI A 7 (a)) ^ wpst(SI). (4) 

This requirement is non-standard in abstract interpretation, and reflects the fact that 
states violating SI are ignored. Thus, the abstract-interpretation system needs to issue 
a warning when this condition is violated. 
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We can also compute the (most-precise) effect in A of a statement st on an abstract 
input value a G A as the tightest over-approximation to the strongest post-condition of 
st: 

best[st]{a) = A 7(a)) wpst(j{a') A SI)} (5) 

Not surprisingly, the formulas for abstract interpretation given above are similar to the 
formulas for Hoare-style verihcation given in Section 3.1. 

In certain circumstances, 7 and validity checkers automatically yield algorithms 
for a{ip) (Eq. (3)), for checking that a statement st maintains SI (Eq. (4)), and for 
the abstract interpretation of a given statement in the most precise way (Eq. (5)). In 
particular, when the abstract domain A is a finite lattice, Eqns. (3), (4), and (5) can be 
used in a brute-force way (by generating a validity query for all elements in the domain). 
For more elaborate algorithms that handle finite-height lattices of infinite cardinality, 
the reader is referred to [12, 14]. 

3.3 Summary 

We now summarize the steps that a user of our approach has to go through to carry 
out a verihcation. The user must provide the inputs listed below. The hrst two inputs 
are standard for many verihcation methods. The other inputs are interdependent, as 
explained later. 

1. A scheme for encoding the program’s possible memory conhgurations as logical 
structures over vocabulary Torg ■ 

2. A description in C{rorg) of the weakest precondition for each statement. 

3. A decidable logic T> over Rep. 

4. The hrst-order mapping rj. 

5. A simulation invariant SI in £(rorg)- 

6 . A representation invariant RI in T>. 

Also, the user must ensure that (i) ( 77 , SI , RI) is admissible, and (ii) the initial program 
state satishes SI . 

We have implemented our approach as part of the TVLA system - a parametric 
system for performing abstract interpretation. The simulation package is initialized by 
reading the dehnition of rorg, Tuep, SI, and tr in FO(TC) from a hie provided by 
the user of TVLA. Currently, it uses WS2S as T> and binary trees as Rep, but it could 
easily be conhgured to use other decidable logics. The package takes formula p G 
FO(TC) and returns a translation into WS2S, computed by substitution. We have also 
implemented a SymbolicChecker that interfaces to the MONA tool. SymbolicChecker 
takes a formula tp G WS2S, and checks whether the formula is valid. We have used 
SymbolicChecker to perform abstract interpretation of sample programs. 

3.4 An Example of a Simulation Invariant 
That Supports Low-Level Heap Mutations 

In this section, we show an example program that temporarily violates a data- structure 
invariant, and show how to gradually generalize the simulation invariant to cope with 
the low-level mutations in this program. 
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Table 2. (a) a data type of colored doubly linked lists: each list node is marked with either RED 
or BLACK, (b) a program that divides a colored doubly linked list into two disjoint lists, pointed 
to by dllR and dllB, each of which contains nodes of one color only. 



/* dll.h */ 


/* divide. c */ 
DLL *dllR, *dllB; 


enum Color 


void divide {DLL *x) { 


{RED, BLACK}; 


DLL *t; 

dllB = dllR = NULL; 


typedef struct node { 


while (x ! = null) { 


struct node *f, *b; 


if (x->c == RED) { 


enum Color c; 


x->b = dllR; dllR = x; 


} *DLL; 


} else { 

x->b = dllB; dllB = x; 

} 

X = x->f; 

} 

if (dllR != NULL) { 




while (dllR- >b != NULL) { 

t = dllR; dllR = dllR->b; dllR->f = t; 

} 




} 

if (dllB != NULL) { 




while (dllB- >b != NULL) { 




t = dllB; dllB = dllB->b; dllB->f = t; 




}}} 


(a) 


(b) 



Consider the data-type in Table 2(a) that defines a doubly-linked list with an addi- 
tional bit for each element, represented by the unary predicates RED{v) and 
BLACK{v). The program divide (x) in Table 2(b) takes as input a doubly-linked 
list pointed to by x that may contain both red and black elements. We can verify that 
this program arranges the elements reachable from x into two disjoint doubly-linked 
lists pointed to by dll R and dllB, each of which contains elements of only one color. 
To automatically verify this property using one of the verification methods described 
above, we need to provide an appropriate simulation invariant. 

Basic Simulation. The basic simulation of DLL{f,b} in Table 1 defines simulation 
invariant S'/I. Consider the statement X- >b = dllR from the divide (x) program. 
The weakest precondition of x- >b = dllR with the basic simulation invariant SIi 
is wpi = yvi,V 2 ■ x{vi) A dllR{v 2 ) f{vi,V 2 )- If a structure satisfies wpi, then 
the result of applying x->b = dllR satisfies SI\. However, one of the structures 
that arises in divide (x) before the statement x- >b = dllR does not satisfy wpi. 
This means that a result of applying this statement, the structure shown on the left 
of Fig. 2(b), does not satisfy SI\. The reason is that the backward pointer from the 
second element does not point to the predecessor of this element, while SI\ allows 
only mutations that result in a structure where the backward pointer is either NULL or 
points to its /-predecessor. The backward pointer to the element pointed to by dllR 
cannot be recorded using only the B{y) predicate, provided by the basic simulation. 
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Fig. 2. Example structures produced by the divide (x) operation on Red-Black List from Ta- 
ble 2, are shown on the left. The corresponding representation structures are on the right, (a) 
initial structure and its basic simulation; (b) the structure after two iterations of the first while 
loop, and its fc-bounded simulation with the set of unary predicates {dllR, dllB}; the symbol || 
on the edge in the Org structure indicates the edge deleted by x- >b = dllR; (c) the stmcture 
at the end of the first loop and its formula-based simulation, as is defined in Eq. (6); (d) the result 
of divide (x) is two disjoint doubly linked lists that can be represented correctly using the 
basic simulation. 



k-Bounded Simulation. The second simulation of DLL{f, b} in Table 1 defines sim- 
ulation invariant Sl2- The weakest precondition of x- >b = dllR and SI2 is wp2 = 
Vvi,V2 ■ x{vi) AdllR{v2) ^ /(wi, ^2) Vd/li?(v2) Vdl/il(w2)- We can show that every 
structure that satisfies SI2 remains in SI2 after x->b = dllR, because SI2 is less 
restrictive than SIi, i.e., |S'/i] C |S'/2]. It allows mutations that redirect the backward 
pointer to an element pointed to by one of the variables pi, ... ,pk- Each pi is unique, 
that is, it may point to at most one element. The translation represents b{vi,V2) using 
unary predicates Bp^ (v), ... , Bp^ (v) in addition to B{v). Bp^ (v) indicates that there is 
a backward pointer from the element v to the element pointed to by pi. For example, 
in divide (x) these variables are dllR and dllB. The representation structure on 
the right of Fig. 2(b) uses predicate BdUR to correctly capture the result of the x- >b = 
dllR operation. 

The subsequent operation dllR = x in program divide (x) moves dllR for- 
ward in the list. The weakest precondition of dllR = x and SI 2 does not hold for 
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the structure in Fig. 2(b), the statement dllR = x makes the previous use of Bdim 
undefined, and SI2 enables the mutation only for structures in which dllR and x point 
to the same element, or no backward pointer is directed to the element pointed to by 
dllR, except maybe the backward pointer from its /-successor (which can be captured 
hy B{v)). 

Simulation with b-Field Defined by a Formula ip. The third simulation of DLL{f, b} 
in Table 1 defines simulation invariant SI^. SI^ defines the backward pointer either by a 
formula ip{vi,V2) G MSO-E or by its predecessor. The translation represents 6(ni, ^ 2 ) 
using a unary predicate Bcp{v) (in addition to B{v)), to indicate that the backward 
pointer from the element represented by vi points to some element represented by V2 
such that ip{vi,V 2 ) holds. In the example, ip{vi,V 2 ) is defined by 

‘p{vi,V2) = ipred{vi,V2) V (Pblack{vi,V2) ( 6 ) 

where ipc{vi,V2) for c G {RED, BLACK} is defined by 

‘Pc{vi,V2) = c(w2)AVn : TC[f]{v2,v) f\TC[f]{v,vi) f\{v ^ V2)f\{v ^ t;i) ^ ^c(n) 

It guarantees that v\ is the first element of color c reachable from V2 using /-edges. 
This simulation can capture all intermediate structures that may occur in divide (x) , 
some of which are shown in Fig. 2(c). 

To summarize, the high-level operation divide (x) preserves the data- structure 
invariant: if the input is a doubly-linked list that satisfies the basic simulation invariant 
SI\ in Table 1, then the result satisfies SI\ as well. The problem is that the implemen- 
tation of divide (x) performs low-level mutations that may temporarily violate the 
invariant (and restore it later). In this case, the mutation operation cannot be simulated 
correctly with the basic invariant SIi, but only with SI^. 

4 Related Work 

4.1 Decidable Logics for Expressing Data-Structure Properties 

Two other decidable logics have been successfully used to define properties of linked 
data structures: WS 2 S has been used in [3, 10] to define properties of heap-allocated 
data structures, and to conduct Hoare-style verification using programmer-supplied loop 
invariants in the PALE system [10]. 

A decidable logic called Lr (for “logic of reachability expressions”) was defined 
in [1]. Lj. is rich enough to express the shape descriptors studied in [13] and the path 
matrices introduced in [4]. Also, in contrast to WS 2 S, MSO-E and 3V(DTC[F^]) [9], 
Lr can describe certain families of arbitrary stores - not just trees or structures with 
one binary predicate. However, the expressive power of Lr is rather limited because it 
does not allow properties of memory locations to be described. For instance, it cannot 
express many of the families of structures that are captured by the shape descriptors that 
arise during a run of TVLA. Lr cannot be used to define doubly linked lists because 
Lr provides no way to specify the existence of cycles starting at arbitrary memory 
locations. 
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4.2 Simulating Stores 

The idea of simulating low-level mutations is related to representation simulation (Mil- 
ner, 1971) and data-structure refinement (Hoare [6]). In [6], the simulation invariant is 
defined over the representation structures; it denotes the set of representation structures 
for which rj is well-defined. This method has the obvious advantage of specifying the 
formulas directly, without the need for translation, whereas our method requires trans- 
lation, which may not be defined for some formulas and some logics. 

PALE [10] uses a hard-coded mapping of linked data-structures into WS2S, and 
uses MONA decision procedures. The simulation technique can be used to extend the 
applicability of WS2S to more general sets of stores than those handled in [10], for ex- 
ample, cyclic shared singly-linked lists, as described in Section 2.3, and also to simulate 
generalized trees and undirected graphs. 

5 Conclusion 

In this paper, we have described a technique that can increase the applicability of deci- 
sion procedures to larger classes of graph structures. We allow graph mutations to be de- 
scribed using arbitrary first-order formulas that can model deterministic programming- 
language statements, including destructive updates. 

We have implemented an interface between TVLA and MONA using simulation. 
This allows us to apply the simulation method as described in Section 3.2, thus, enabling 
precise abstract interpretation. 
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Abstract. We introduce a new BDD-like data structure called Hybrid- 
Restriction Diagrams (HRDs), for the representation and manipulation 
of linear hybrid automata (LHA) state-spaces, and present algorithms 
for weakest precondition calculations. This permits us to reason about 
the valuations of parameters that make safety properties satisfied. Ad- 
vantages of our approach include the ability to represent discrete state 
information and concave polyhedra in a unified scheme as well as to save 
both memory consumptions and manipulation times, when processing 
the same substructures in state-space representations. Our experimental 
results document its efficiency in practice. 

Keywords: data-structures, BDD, hybrid automata, verification, model- 
checking 



1 Introduction 

Linear hybrid automata (LHA) are state-transition systems equipped with con- 
tinuous variables that can change values with different rates [6] . They are impor- 
tant to computing and society because of their extensive modeling capability. In 
practice, such models are usually presented with symbolic constants, called pa- 
rameters, whose values may engender different behaviors of the models. Setting 
and calibrating these parameters is a crucial task for the engineers developing 
hybrid systems. Parametric safety analysis (PSA) of LHA can generate a sym- 
bolic characterization of the parameters’ valuations, called solutions, that make 
a model satisfy a safety property. Such a symbolic characterization, once con- 
structed, sheds important feedback information to engineers and can be used 
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repeatedly in the synthesis and verification of implementations of the corre- 
sponding model. 

Although, the emptiness problem of PSA solution spaces is undecidable in 
general [7], people have constructed experimental tools for the verification of 
LHA [1,3, 4, 6] based on the following three reasons. First, the verification problem 
is by itself interesting and challenging. Second, it may happen that there exist 
techniques and strategies which can solve many typical examples. Third, there 
is still an urgent need in industry for such tools to help engineers with fast 
prototyping of their compelx designs. 

Because LHA necessarily involves continuous variables, it is helpful and more 
efficient to represent and manipulate their state-space symbolically. Others have 
developed separate representation schemes for the discrete part and continuous 
part of LHA [1,3, 4, 6]. According to previous experiences [9,11,17-19], it can be 
more efficient to have a unified data-structure for both the discrete part and the 
continuous part. Moreover, the schemes used in [1,3,4, 6] only represent convex 
polyhedra in LHA state-space ( concave polyhedra have to be represented as sets 
of convex ones) and suffer from the limitation to share common representation 
structures in different convex polyhedra. 

In this work, we extend BDD [9,11] for the representation and manipulation 
of LHA state-spaces. A BDD [11] is an acyclic graph, with a single source and 
two sinks for true and false respectively. Each internal node is labeled with a 
Boolean decision atom and has a tree-child and a false-child. A BDD works as 
a decision diagram for state-space membership. Any sequence of decision atoms 
labeled on a path in a BDD must follow a predefined total ordering. 

The term BDD-like data-structure is coined in [19] to call the many BDD ex- 
tensions in the literature. BDD-like data-structures have the advantage of data- 
sharing in both representation and manipulation and have shown great success 
in VLSI verification industry. The same structure following two decision path 
prefixes will only be represented once. Also, a pattern of operand substructures 
will only be manipulated once. The manipulation result will be saved and re- 
turned immediately the next time when the same pattern is encountered. These 
features of BDD-technology lead to not only savings in memory consumptions 
but also speed-up in manipulations. 

One of the major difficulties to use BDD-like data-structures to analyze LHAs 
comes from the unboundedness of the dense variable value ranges and the un- 
boundedness of linear constraints. To explain one of the major contribution of 
this work, we need to discuss the following issue first. In the research of BDD-like 
data-structures, there are two classes of variables: system variables and decision 
atoms [19]. System variables are those used in the input behavior descriptions. 
Decision atoms are those labeled on each BDD nodes. For discrete systems, these 
two classes are the same. But for dense-time systems, decision atoms can be dif- 
ferent from state variables. For example, in CDD (Clock-Difference Diagram) [10] 
and CRD (Clock- Restriction Diagram) [19], decision atoms are of the form x — x' 
where x and x' are system variables of type clock. Previous work on BDD-like 
data-structures are based on the assumption that decision atom domains are of 
finite sizes. Thus we need new techniques to extend BDD-like data-structures 
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to represent and manipulate state-spaces of LHAs. Our innovations include us- 
ing constraints, like — 3A -\- x — Ay (where A,x,y are dense system variables), 
as the decision atoms and using total dense orderings among these atoms. In 
this way, we devise HRD (Hybrid-Restriction Diagram) and successfully extend 
BDD-technology to models with unbounded domains of decision atoms. 

In total, we defined three total dense orderings for HRD decision atoms (sec- 
tion 5). We also present algorithms for set-oriented operations (section 6) and 
symbolic weakest precondition calculation (section 7), and a procedure for sym- 
bolic parametric safety analysis (section 7). We have also developed a technique, 
which prunes state-space exploration based on parameter space characterization 
and enhances the performance of our tool by orders of magnitude (section 8). 
Desirably, this technique does not sacrifice the precision of parametric safety 
analysis. Especially, for one benchmark in our experiments (see section 9), the 
state-space exploration does not converge without this technique! To our knowl- 
edge, nobody else has come up with a similar technique. Finally, we have im- 
plemented our ideas in our tool red 5.0 and reported our experiments to see 
how the three dense-orderings perform and how our implementation performs in 
comparison with HyTech 2.4.5 [13] and TReX 1.3 [1,3]. 

2 Related Work 

People have used convex subspaces, called convex polyhedra, as basic unit for 
symbolic manipulation. A convex polyhedron characterizes a state-space of an 
LHA and can be symbolically represented by a set of constraints like aix\ + 

. . . -b anXn ~ c [4-0]. Two commonly used representations for convex polyhe- 
dra in HyTech are polyhedras and frames in dense state-space [13]. These two 
representations neither are BDD-like nor can represent concave state-spaces. 
Data-sharing among convex polyhedra is difficult. 

In 1993 [22], Wang, Mok, and Emerson discussed how to use BDD with de- 
cision atoms like Xi + c < xj + d to model-check timed automata. In the last 
several years, people have explored in this approach in the hope of duplicating 
the success of BDD techniques [9,11] in hardware verification for the verification 
of timed automata [2,8,10,14-19]. Especially, our HRD can be seen as variation 
of CDD [10] and extension of CRD [17,18] for timed systems. In [10], CDD only 
served as a recording device in that reachable state-space representations were 
analyzed using DBM [12] and then converted to CDD. In [19-21], a full set of 
verification algorithms (including forward/backward reachability analysis, nor- 
malization, and full TCTL model-checking procedures) for CRD were reported. 

For parametric safety analysis, Annichini et al have extended DBM [12] to 
PDBM for parametric safety analysis of timed automata [1,3] and implemented 
a tool called TReX, which also supports verification with lossy channels. 

3 Parametric Safety Analysis of Linear Hybrid Antomata 

A linear hybrid automaton (LHA) [6] is a finite-state automaton equipped with 
a finite set of dense system variables which can hold real- values. At any moment. 



298 Farn Wang 




Fig. 1. Fischer’s timed mutual exclusion algorithm in LHA 



the LHA can stay in only one mode (or control location). In its operation, one of 
the transitions can be triggered when the corresponding triggering condition is 
satisfied. Upon being triggered, the LHA instantaneously transits from one mode 
to another and sets some dense variables to values in certain ranges. In between 
transitions, all dense variables increase their readings at rates determined by the 
current mode. Since LHA has been well-studied in the literature, we leave its 
definition in our full version report. In figure 1, we have drawn a version of the 
Fischer’s mutual exclusion algorithm for a process. There are two parameters a 
and (3 that control the behavior of the processes. In each mode, local clock x 
increases its reading according to a rate in [4/5, 1] or [1, 11/10]. 

Our verification framework is called parametric safety analysis (PSA) prob- 
lem. Such a problem instance, denoted PSA(A,t 7 ), consists of an LHA A and a 
safety state-predicate r] and asks for a symbolic characterization of all param- 
eter valuations, called solutions, that make all reachable states satisfy p. The 
general parametric safety analysis problem can be proved incomputable with a 
straightforward adaptation of the undecidability proof in [7]. 

4 HRD (Hybrid-Restriction Diagram) 

For efficiency of manipulation, we have the following requirements on constraints 
used to represent a convex polyhedron. Given a constraint like ~ c, (1) 

afs are integers such that gcdjui | 1 < z < n; yf 0} = 1, (2) x\, . . . ,Xn are 
system variables, (3) and (4) c is either a rational number or oo 

such that when c = oo, ~= We shall call aiXi' an LH-expression and 
“(~,c)” an LH-upperbound. Formally, a convex polyhedron can be defined as a 
mapping from the set of LH-expressions to the set of LH-upperbounds. For a 
given X, the set of all LH-expressions and the set of convex polyhedra are both 
infinite. For any two (~,c) and (~',c'), (~,c) is more restrictive than (~',c'), 
denoted (~, c) C (~', c'), iff c < c' or (c = c'A ~= “<”A 

To construct BDD-like data-structures, three fundamental issues have to be 
solved. The first is the domain of the decision atoms; the second is the range of 
the arc labels from BDD nodes; and the third is the evaluation ordering among 
the decision atoms. For modularity of presentation, we shall leave the discussion 
of the evaluation orderings to section 5. In this section, we shall assume that we 
are given a decision atom evaluation ordering. 

We decide to follow an approach similar to the one adopted in [19]. That 
is, the decision atoms of HRD are LH-expressions and the arcs are labeled with 
LH-upperbounds. A node label aiXi with a corresponding outgoing arc label 
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(a) HRD with coefficient ordering (b) HRD with magnitude ordering 




(c) HRD with dictionary ordering 



Fig. 2. Examples of HRD 



(~,c) constitute the constraint of ~ c. A source-to-sink path in an 

HRD thus represents the conjunction of constituent constraints along the path. 
Figure 2(a) is an HRD example for the concave space of 

{x2 — X3 < —hjl V — 5 A — 2 x 2 + 10x3 < 48/7) /\ A — X2 + IOX3 < 9 

assuming that —5A— 2x2 + 10x3 precedes X2 — X3 (in symbols —5A— 2x2 + 10x3 + 
X2 — X3) and X2 — X3 precedes A — X2 + IOX3 in the given evaluation ordering. 
In this example, the system variables are A, X2,X3 while the decision atoms are 
X2 — X3, — 5A — 2x2 + 10x3, and A — X2 + IOX3. 

Definition 1. HRD (Hybrid- Restriction Diagram) Given a dense variable set 
X = {xi,...,Xra} and an evaluation ordering ^ among normalized LH-ex- 
pressions of A, an HRD is either true or a tuple (x, (/3i, Di), . . . , (/3m, Dm)) 
such that 

• X is a normalized LH-expression; 

• for each 1 < i < m, /3i is an LH-upperbound s.t. (<, 00) yf /?i C /32 C . . . C 
(dm] and 

• for each 1 < / < m, is an HRD such that if Di = (xj, . . .), then x + x^. 

For completeness, we use “()” to represent the HRD for false. ■ 

In our algorithms, false does not participate in comparison of evaluation 
orderings among decision atoms. Also, note that in figure 2, for each arc label 
(~, c), we simply put down ~ c for convenience. 

5 Three Dense Orderings among Decision Atoms 

In the definition of a dense-ordering among decision atoms (i.e., LH-expressions), 
special care must be taken to facilitate efficient manipulation of HRDs. Here we 
use the experience reported in [19] and present three heuristics in designing the 
orderings among LH-expressions. The three heuristics are presented in sequence 
proportional to their respective importances. 

HEURISTICS I : It is desirable to place a pair of converse LH-expressions next 
to one another so that simple inconsistencies can be easily detected. That is, LH- 
expressions ^iXi and ~o-iXi are better placed next to one another in the 
ordering. With this arrangement, inconsistencies like — Xi+3x2 < — 5Axi— 8x2 < 
0 can be checked by comparing adjacent nodes in HRD paths. To fulfill this 
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requirement, when comparing the precedence between LH-expressions in a given 
ordering, we shall first toggle the signs of coefficients of an LH-expression if its 
first nonzero coefficient is positive. If two LH-expressions are identical after the 
necessary toggling, then we compare the signs of their first nonzero coefficients 
to decide the precedence between the two. 

With this heuristics, from now on, we shall only focus on the orderings among 
LH-expressions whose first nonzero coefficients are negative. 

HEURISTICS II : According to the literature, it is important to place strongly 
correlated LH-expressions close together in the evaluation orderings. Usually, in- 
stead of a single global LHA, we are given a set of communicating LHAs, each 
representing a process. Thus it is desirable to place LH-expressions for the same 
process close to each other in the orderings. Our second heuristics respects this 
experience. Given a system with m processes with respective local dense vari- 
ables, we shall partition the LH-expressions into m -|- 1 groups: Go, Gi, . . . , Gm- 
Go contains all LH-expressions without local variables (i.e., coefficients for local 
variables are all zero). For each p > 0, Gp contains all LH-expressions with a 
nonzero coefficient for a local variable of process p and only zero coefficients for 
local variables of processes p+ 1 , . . . , to. Then our second heuristics requires that 
for all 0 < p < TO, LH-expressions in Gp precede those in Gp+i, . . . , Gm- 

HEURISTICS III : If the precedence between two LH-expressions cannot be 
determined with heuristics I and II, then the following third heuristics comes 
to play. The design of this heuristics is a challenge since each of Go,...,Gm 
can be of infinite size. Traditionally, BDD-like data-structures have been used 
with finite decision atom domains. Carefully examining the BDD manipulation 
procedures, we found that the sizes of the domains do not matter. What really 
matters is the ordering among all decision atoms so that we know which atoms 
is to be decided next along a decision path. Thus, we invent to use the following 
three dense-or derings among LH-expressions. 

Dictionary Ordering: We can represent each LH-expression as a string, as- 
suming that the ordering among a;i, . . . is fixed and no blanks are used in 
the string. Then we can use dictionary ordering and ASCII ordering to decide 
the precedence among LH-expressions. For the LH-expressions in figure 2, we 
then have — 5 A — 2x2 + < A — X2 + lOxa X2 — X3 since ’ precedes ‘A’ 

and ‘A’ precedes ‘x’ in ASCII. The corresponding HRD in dictionary ordering 
is in figure 2(c). One interesting feature of this ordering is that it has the po- 
tential to be extended to nonlinear hybrid constraints. For example, we may say 
cos(a;i) + X2 X2~ X2X3 in dictionary ordering since ‘c’ precedes ‘s’ in ASCII. 

Coefficient Ordering: Assume that the ordering of the dense variables is fixed 
as si, . . . ,Xm- In this ordering, the precedence between two LH-expressions is 
determined by iteratively comparing the coefficients of dense variables si , . . . , s„ 
in sequence. For the LH-expressions in figure 2, we then have — 5A— 2s2-l-10s3 ^ 
X2 — X3 A — X2 + IOS3. The HRD in this ordering is in figure 2(a). 

Magnitude Ordering: This ordering is similar to the last one. Instead of com- 
paring coefficients, we compare the absolute values of coefficients. We iteratively 
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• first compare the absolute values of coefficients of Xi, and 

• if they are equal, then compare the signs of coefficients of Xi. 

For the LH-expressions in figure 2, X 2 ~ x^ < A — X 2 + 10a;3 -< —bA— 2x2 + Wx^ 
in this magnitude ordering. The corresponding HRD is in figure 2(b). 

6 Set-Oriented Operations 

Please be reminded that an HRD records a set of convex polyhedra. For con- 
venience of discussion, given an HRD, we may just represent it as the set of 
convex polyhedra recorded in it. Definitions of set-union (U), set-intersection 
(n), and set-exclusion (— ) of two convex polyhedra sets respectively represented 
by two HRDs are straightforward. For example, given HRDs D\ : {C 11 C 2 } and 
D 2 ■■ {C2,C3}, DiDD 2 is the HRD for {C 2 }; D 1 UD 2 is for {Ci, C 2 , Cs}; and D 1 -D 2 
is for {Cl}- The complexities of the three manipulations are all 0(|Di| • ID 2 I). 

Given two convex polyhedra Ci and C2, Ci C2 is a new convex polyhedron 
representing the space-intersection of Ci and C 2 . Formally speaking, for decision 
atom Cl AC 2 (I]i«*a;i) = if lA C 2 (I]i«i 2 ;i); or 

C2(Si®i®i) otherwise. Space-intersection (A) of two HRDs Di and D 2 , in sym- 
bols Di A D2, is a new HRD for {Ci A C2 | Ci € T>i;C2 G D 2 }. Our current 
algorithm of the manipulation has complexity 0 (|Dip|D 2 p). 

Given an evaluation ordering, we can write HRD-manipulation algorithms 
pretty much as usual [9,11,15,19]. For convenience of presentation, we may 
represent an HRD (m, {/3i,Bi), {jSn, Bn)) symbolically as (u, (/?*, Hi)i<i<n)- 
A union operation union(R, D) can then be implemented as follows. 



set W; /* database for the recording of already-processed cases */ 
union(H, D) { 

if i? = false, return D; else ii D = false, return B] 

•F := 0; return rec_union(i?, D); 

} 

rec_union(H, D) where B = {u,{j3i,Bi)i<i<n), D = {v,{aj,Dj)i<j<rn) { 

if B is true or D is true, return true] else if 3F, {B, D, F) G 'F, return F] (1) 
else if u A u, construct F := {u, (/3i, rec_union(Hi, Zl))i<i<„); 
else if u A M, construct F := {v, (oj, rec_union(H, 
else { 

i := n; j := m; F := false; 
while i > 1 and j > 1, do { 

if Pi = aj, { F := F U {u, (/3i, rec_union(Rj, D^))); i ; j ; } 

else if A C aj, { F \= F U {u, (aj,Dj))] j := j - 1; } 
else if aj C A, { F := F U {u, {P^, B^); i := i - 1; } 

} 

if i > 1, F := F U (u, (/3i, Fi)i</i<i); 

A j > I, F := F U {u,{ai,Di)i<k<j); 

} 

F := F U {{B, D, F)}; return F; 



} 



( 2 ) 
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Note in statement (1), we take advantage of the data-sharing capability of 
HRDs so that we do not process the same substructure twice. Set •f' is maintained 
in statement ( 2 ) to record the processing result the first time the substructure 
is encountered. The algorithms for H and — are pretty much the same. The one 
for space intersection A is much more involved and is not discussed here. 



7 Weakest Preconditon Calculation and Symbolic PSA 



Our tool red 5.0 runs a backward reachability analysis by default. Suppose we 
are given an LHA A. There are two basic procedures in this analysis procedure. 
The first, xtion(T),e), computes the weakest precondition from state-space rep- 
resented by HRD D through discrete transition e from mode q to q' . Assume 
that the variables that get assigned in e are yi, . . . ,yk and there is no variable 
that gets assigned twice in e. The characterization of xtion(Z?, e) is 



y{q) A r(e) A 3yi . . . 3yk{D A Ai<i<kVi G 7r(e, yi)) ^ 

Assume that delta_exp(D) is the same as D except that all dense variables 
X are replaced by a; -I- respectively. Here 5x represents the value-change of 
variable x in time-passage. For example, delta_exp(2a;i — 3 a ;2 < 3/5) = 2a;i -I- 
25,a - 3X2 - 3<5,, < 3/5. 

Assume that for each dense variable x, 7 ( 5 , x) specifies the rate interval of x 
in q. The second basic procedure, time(D, q), computes the weakest precondition 
from D through time passage in mode q. It is characterized as 



y{q) A 35x^35x^ ■ ■ ■ 35x^35 



5 > 0 A delta_exp(T)) 

^ ^l<i<n\^{q,Xi) — {di,d!^)^Xi G {diS^ d^S) 



One basic building block of both xtion() and time() is for the evaluation of 
3x{D{x)) and is implemented as follows. 



3x(D(x)) = var_del(xtivity(Z3, x), {x}). 

Procedure var_del(Z?, A') eliminates all constraints in D involving variables in 
set X' . Procedure xtivity(£>, x) adds to a path every constraint that can be 
transitively deduced from two peer constraints involving x in the same path in 
D. The algorithm of xtivity() is as follows. 



set R, S; 

xtivity(D,x) { i? := 0; return rec_xtivity(Z3); } 
rec_xtivity(I?) { 

if D is true or false, return D; else if 3{D,D') G R, return D'; 
else /* assume D = {ax + e, {Pi,Di), {(i^, Dm)) */ { 

S := 0; D' := Ui<i<m ^ rec_xtivity_given(A, ax + e, j3i)\ 

R\= Ryj {{D,D^}\ return D'\ 

} 

} 

^ y £ [d,d'] = d < y < d' . y £ {d,d'\ = d < y < d' . y £ [d,d') = d < y < d' . 
y £ {d,d') = d < y < d' . 
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rec_xtivity_given(£), aa; + e, /3) { 

if D is true or false, return D] else if 3(D, D') G S, return D'; (3) 

else /* assume D = (te + e', (/3i, L>i), . . . , (/3m, ^’m)) */ { 
if ab < 0, 

II f + e'/3i A rec_xtivity_given(_Di, a® + e, /3) \ 

■- Ui<j<m |5|g/ gcd(a, b) + |a|e'/ gcd(a, fe)((|6|/3 + |a|/3i)/ gcd(a, b)) J ’ 

else D' := Ui<i<m rec_xtivity^iven(_Di, ax + e,/3); 

S' := 5 U {(13, i7)}; return Z3'; (4) 

} 

} /* (|6|/3 + |a|/3i)/gcd(a,6) is a shorthand for the new upperbound obtained 
from the xtivity of ax + e/3 and hx + e'(3i. */ 



Thus we preserve all constraints transitively deducible from a dense variable 
before it is eliminated from a predicate. This guarantees that no information will 
be unintentionally lost after the variable elimination. 

Note that in our algorithm, we do not enumerate all paths in HRD to carry 
out this least fixpoint evaluation. Instead, in statement (3), our algorithm fol- 
lows the traditional BDD programming style which takes advantage of the data- 
sharing capability of BDD-like data-structures. Thus our algorithm does not 
explode due to the combinatorial complexity of path counts in HRD. This can 
be justified by the performance of our implementation reported in section 9. 

Assume that the transition set is E and the unsafe states are in mode qj. 
With the two basic procedures, then the backward reachable state-space from 
the unsafe states in ^r] (represented as an HRD) can be characterized by 

IfpZ. (t±me{^ri, qf) U time(xtion(Z, e), g)) 

Here IfpZ.F(Z) is the least fixpoint of function F{) and is very commonly used 
in the reachable state-space representation of discrete and dense-time systems. 
After the fixpoint is successfully constructed, we conjunct it with the initial con- 
dition and then eliminate all variables except those parameters (geometrically 
speaking, projecting the reachable state-space to the dimensions of the parame- 
ters). Suppose the set of dense parameters is H. The characterization of unsafe 
parameter valuations is thus 

var_del(7 A lfpZ.(time(^? 7 , qj) U Ue=(9,9')6£^ time(xtion(Z, e),q)),X - H) 

The solution space is characterized by the complement of this final result. 

In our experience, normalizing convex polyhedron representations in HRDs 
may consume most of the resources. At this moment, our implementation does 
not enforce a normal form. We only try to eliminate some redundancies that can 
be derived from two peer constraints in the same convex polyhedron. 

Polyhedron containment checking in an HRD is also important in controlling 
the size of the HRD. Specifically, we want to check if a path in the HRD specifies a 
polyhedra that is subsumed by another path’s HRD. In [19], a sufficient condition 
called SCR (Straightforward Containment Requirement) was presented for the 
containment checking of zones. We have adapted SCR for HRDs. Specifically, 
given D = (v, {aj, Z3j)i<j<m), SCR(Z3) = (v, (aj, Dj — Uj<fc<m ^k)i<j<m)- 
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Lemma 1. Given any HRD D, D and SCR(I?) characterize the same real space. 

■ 

In our current implementation, when we need to eliminate polyhedras con- 
tained by peer polyhedras in an HRD, we perform SCR to each node in the HRD 
in a bottom-up fashion. 

8 Pruning Strategy Based 

on Parameter Space Construction (PSPSC) 

We have also experimented with techniques to improve the efhciency of para- 
metric safety analysis. One such technique, called PSPSC, is avoiding new state- 
space exploration if the exploration does not contribute to new parametric so- 
lutions. A constraint is static iff all its dense variables are parameters. Static 
constraints do not change their truth values. Once a static constraint is derived 
in a convex polyhedron, its truth value will be honored in all weakest precon- 
ditions derived from this convex polyhedron. All states backwardly reachable 
from a convex polyhedron must also satisfy the static constraints required in the 
polyhedron. Thus if we know that static parameter valuation TL is already in 
the parametric solution space, then we really do not need to explore those states 
whose parameter valuations fall in P[. 

With PSPSC, our new parametric safety analysis procedure is as follows. 



PSA__with_PSPSC(A, 77 ) { 

D := time(^? 7 , gj); D := false] P := var_del(D, X — H)] 
while D 7 ^ false, do { 

D := D U D] D := \Je=(q,q')&E time(xtion(Z), e),q)] 

D ■.= DA(^P) A hD)]_. \ 

P := P U var_del(/ A D, X — H); 

} 

return 



} 



( 5 ) 



In the procedure, we use variable P to symbolically accumulate the para- 
metric evaluations leading to the unsafe states in the least fixpoint iterations. In 
statement (5), we check and eliminate in D those state descriptions which cannot 
possibly contribute to new parametric evaluations by conjuncting D with ~^P. 

One nice feature of PSPSC is that it does not sacrifice the precision of our 
parametric safety analysis. 

Lemma 2. Ti. is a parametric solution to A and p iff Td satisfies the return 
result o/PSA_with_PSPSC(A,77). 

Proof: Note that the intersection at line (5) in procedure PSA_with_PSPSC() 
only stops the further exploration of those states that do not contribute to new 
parameter-spaces. Those parameter-spaces pruned in line (5) do not contribute 
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Table 1. Comparison with HyTech w.r.t. number of processes 



direction 


Denchmarks 


rates 

used 


m 


HyTech 

2.4.5 


red 5.0 (backward) 


dictionary | coetticient | magnitude 


coetticient 


no pypycj 


F5F5C 


backward 


Fischer’s 

mutual 

exclusion 

(m 

Drocesses) 


[4/5. IJ 
[1,11/10] 


2 


0.23s 


0.10s/17k 


0.11s/17k 


0.11s/17k 


0.07s/16k 


3 


2.40s 


1.83s/81k 


1.75s/74k 


1.23s/59k 


0.70s /44k 


4 


28.04s 


20.29s/320k 


23.85s/269k 


12.38s/215k 


5.14s/163k 


b 


ayra 


278.8s/1420k 


354.1s/1149k 


162.0s/1034k 


31.36s/474k 


6 


ayra 


2846s/5848k 


9923s/8796k 


1485s/4000k 


168.6s/1170k 


CrC 

(gate 

-1-controller 
■\-m trains) 


0.0], [1.1 
[9, 10] 
[-10,-9] 


2 


oyw 


0.79s/103k 


0.68s/101k 


0.68s/101k 


0.76s/94k 


3 


UJM 


11.48s/806k 


8.85s/616k 


8.84s/616k 


11.48s/530k 


4 


UjM 


248.5s/6046k 


184.9s/4249k 


186.1s/4249k 


252.5s/2820k 


b 


Dyra 


3095s/37093k 


4883s/25841k 


4900s/25841k 


6527s/19234k 


•eactor 
(controller 
-1-m rods) 


[1.1| 


2 


0.056s 


0.08s/ 19k 


0.07s/ 19k 


0.06s/ 19k 


0.05s/ 15k 


3 


0.33s 


0.41s/51k 


0.38s/52k 


0.37s/52k 


0.22s/41k 


4 


2.61s 


3.10s/187k 


2.69s/186k 


2.71s/186k 


1.42s/155k 


5 


31.29s 


41.47s/1042k 


37.03s/1039k 


36.89s/1039k 


18.67s/884k 


6 


647. 8e 


951.5s/8228k 


866.9s/8191k 


839.3s/8191k 


461.8s/6941k 


UyMA/CD 

(bus 

■\-m senders) 


[1.1| 


2 


ayra 


0.98s/42k* 


1.47s/125k 


0.5 7s/ 34k 


0.56s/33k 


3 


oyw 


O/M 


5076s/2407k* 


121.5s/807k 


0.66s/105k 


4 


UjM 


oyM 


oyM 


O/M 


2.47s/378k 


b 


Dyra 


oyra 


oyra 


oyra 


9.77s/1192k 


6 




oyra 


oyra 


oyra 


40.58s/3513k 


forward 


Fischer’s 

mutual 


same as 
above 


2 


0.34s 


0.10s/20k 


0.10s/20k 


0.10s/19k 


0.08s/18k 


3 


37.89s 


O/M 


22.10s/561k 


19.18s/654k 


5.59s/538k 


[J[TC 


same as 
above 


2 


3.29s 


2.29s/192k 


1.41s/95k 


1.43s/95k 


0.44 s/ 84k 


3 


ayra 


O/M 


oyra 


O/M 


6.35s/418k 


reactor 


[i.i| 


2 




oyra 


oyra 


D7M 


O/M 


USMA/CD 


[i.i| 


2 


0.19s 


0.19s/29k 


0.17s/29k 


0.17s/29k 


0.25s/33k 


3 


2.63s 


1.81s/102k 


1.64s/101k 


1.62s/101k 


2.61s/106k 


4 


68.75s 


20.07s/370k 


17.49s/378k 


17.52s/378k 


27.03s/378k 


b 


ayra 


268.0s/1905k 


240.3s/1906k 


242.2s/1906k 


331.9s/1910k 


b 


Dyra 


3889s/11725k 


3123s/11525k 


3155s/11525k 


4163s/11552k 



s: seconds; k: kilobytes of memory in data-structure; O/M: Out of memory; 



because they are already contained in the known parameter constraints P and 
along each exploration path, the parameter constraints only get restricter. ■ 
PSPSC can help in pruning the space of exploration in big chunks. But in 
the worst case, PSPSC does not guarantee the exploration will terminate. In 
section 9, we shall report the performance of this technique. Especially, for one 
benchmark, the state-space exploration cannot converge without PSPSC. 

9 Implementation and Experiments 

We have implemented our ideas in our tool red which has been previously re- 
ported in [15-19] for the verification of timed automata. We have also carried out 
experiments and compared with HyTech 2.4.5 [13] and TReX 1.3 [1,3]. Details 
about the benchmarks can be found in our full version report. Experiments are 
conducted on a Pentium 4M 1.6GHz/256MB running LINUX. 

Comparison with HyTech 2.4.5: We compare performance in both forward and 
backward reachability analyses. The performance data of HyTech 2.4.5 and red 
5.0 with dictionary ordering (no PSPSC), coefficient ordering (no PSPSC), mag- 
nitude ordering (PSPSC), and coefficient ordering with PSPSC is reported in 
table 1. The experiment, although not extensive, does show signs that HRD- 
technology (with or without PSPSC) can compete with the technology used 
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Table 2. Performance comparison with TReX w.r.t. number of processes 



benchmarks 


concurrency 


Forward 


Backward 


TR5X 

1.3 


red 5.0 


THSX 

1.3 


red 5.0 


magnitude 


coett.+FyFyCJ 


magnitude 


coeff.+PSPSC 


Fischer’s 

mutual 

exclusion 

(m 


2 procs 


1.12s 


0.07s/17k 


0.07s/15k 


S.96s 


0.08s/13k 


0.04s/13k 


3 procs 


■Q/M 


1.86s/137k 


0.78s/79k 


TT7S 


0.66s/43k 


0.49s/43k 


4 procs 


~UJM 


197.9s/2714k 


16.92s/539k 




5.81s/180k 


3.58s/158k 


5 procs 


~UJM 


NTS 


752.7s/5254k 


N/A 


59.27s/945k 


24.71s/658k 


h procs 


~UJM 


N7A 


N7^ 


N/A 


567.1s/4341k 


170.3s/2798k 


reactor 
(m rods) 


2 rods 


O/M 


O/M 


O/M 


N/A 


0.06s/19k 


0.05s/15k 


3 rods 


~oTm 


D/M 


O/M 


TT/A 


0.37s/52k 


0.22s/41k 


4 rods 


~UJM 


D/ra 


O/M 


N/A 


2.71s/186k 


1.42s/155k 


5 rods 


Tj/ra 


D/ra 


O/M 


N/A 


36.89s/1039k 


18.67s/884k 


6 rods 


TJ/ra 


D/ra 


a/ra 


N/A 


839.3s/8191k 


461.8s/6941k 



s: seconds; k: kilobytes of memory in data-structure; 
0/M: Out of memory; N/A: not available; 



in HyTech 2.4.5. For all the benchmarks, HRD-technology demonstrates better 
scalability w.r.t. concurrency complexity. 

Finally, PSPSC cuts down the time and memory needed for parametric safety 
analysis. Especially, in forward analysis of the general railroad benchmark with 
three trains, without PSPSC, the state-space exploration fails to converge. This 
shows very good promise of this technique. 

Comparison with TReX 1.3: Since TReX 1.3 only supports the verification of 
systems with clocks, parameters, and lossy channels, we choose the following 
two benchmarks. The first is Fischer’s protocol with all clocks in the uniform 
rate of 1. The second is the Nuclear Reactor Controller. The performance data 
is shown in table 2 for both forward and backward analysis. 

Two additional options of red 5.0 were chosen: coefficient evaluation ordering 
with PSPSC and magnitude evaluation ordering without. We marked N/A (not 
available) with higher concurrencies when we feel that too much time (like more 
than 1 hour) or too much memory (20MB) has been consumed in early fixpoint 
iterations. Although the data set is still small and incomplete, but we feel that 
the HRD-technology shows promise in the table. 
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Abstract. We present a new abstraction-based framework for deciding 
satisfiability of quantifier-free Presburger arithmetic formulas. Given a 
Presburger formula (j>, our algorithm invokes a SAT solver to produce 
proofs of unsatishability of approximations of 0. These proofs are in 
turn used to generate abstractions of (j) as inputs to a theorem prover. 
The SAT-encodings of the approximations of (j> are obtained by instan- 
tiating the variables of the formula over hnite domains. The satisfying 
integer assignments provided by the theorem prover are then used to 
selectively increase domain sizes and generate fresh SAT-encodings of (p- 
The efficiency of this approach derives from the ability of SAT solvers 
to extract small unsatisfiable cores, leading to small abstracted formu- 
las. We present experimental results which suggest that our algorithm is 
considerably more efhcient than directly invoking the theorem prover on 
the original formula. 



1 Introduction 

Decision procedures for arithmetic over the integers have many applications in 
formal verification. For instance, the quantifier-free fragment of Presburger arith- 
metic has been used in infinite-state model checking [9] , symbolic timing verifi- 
cation [2] , and RTL-datapath analysis [8] . Unfortunately, the satisfiability prob- 
lem for quantifier- free Presburger arithmetic is known to be NP-complete [25]. 
Consequently, efficient techniques and tools for solving such problems are very 
valuable. 

In this paper, we present an abstraction-based algorithm for the satisfia- 
bility solving of quantifier- free Presburger formulas (QFP formulas for short). 
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Presburger arithmetic is the first-order theory of linear arithmetic over the non- 
negative integers. It was shown to be decidable in [27], although the best-known 
decision algorithms have complexity triply exponential in the size of the for- 
mula [29]. For many applications, however, the quantifier-free fragment of Pres- 
burger arithmetic suffices. 

The algorithm we propose receives as input a QFP formula and attempts 
to satisfy it over a small range of integers, through a Boolean encoding of the 
formula and interaction with a SAT solver. In case the Boolean formula is found 
to be unsatisfiable, the SAT solver is able to supply us with a proof of this fact, 
in the form of a (small) unsatisfiable core. This in turn can be used to pin- 
point the linear arithmetic constraints of the original QFP formula that cannot 
be satisfied over our bounded domain. This (likewise small) set of linear con- 
straints represents an abstraction of the original QFP formula: whenever these 
constraints cannot be satisfied over the whole of the non-negative integers, the 
original formula is unsatisfiable as well. The advantage of this operation is that 
it is generally much easier to solve the abstracted formula than the original one. 
We can do so by using any of the existing decision procedures for Presburger 
arithmetic. The abstracted formula may however be satisfiable; in that case, we 
increase the size of our bounded integer domain to accommodate the satisfying 
integer assignment supplied by the decision procedure. We then repeat the whole 
process until the original formula is shown to be either satisfiable or unsatisfiable. 

Note that while the domain over which we work (the integers) is infinite, 
termination is guaranteed provided the inner Presburger decision procedure we 
use is itself complete. 

Our implementation of this algorithm uses the SAT solvers zChaff [32] and 
SMVSAT^, as well as the commercial constraint-solving package CPLEX [17]. 
Our experimental results, which include both random formulas as well as indus- 
trial benchmarks, suggest that our algorithm is considerably more efficient than 
directly invoking our inner CPLEX-based Presburger decision procedure on its 
own. 

Related Work. There currently exist a number of algorithms and tools for 
solving QFP, some of which we discuss in Section 2. We refer the reader to the 
excellent surveys [18, 14] for a more detailed presentation of the matter. While 
no single technique is found to dominate all others, Ganesh et al. [14] report that 
ILP-based methods perform best in most contexts. To the best of our knowledge, 
however, no existing tools make use of the sort of abstractions that we have just 
described. 

McMillan and Amla [23] use a related technique to accelerate model checking 
algorithms over finite Kripke structures. More precisely, they invoke a bounded 
model checker to decide which state variables should be made visible in order 
to generate a ‘good’ abstraction for the next iteration of model checking. Our 
approach differs from theirs in several respects: we work over an infinite domain, 
we use a Presburger decision procedure instead of a model checker, and we seek 
to eliminate constraints rather than variables. 

^ We thank Ken McMillan for providing us with this proof-generating SAT solver. 
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Henzinger et al. [15] use a theorem prover to refute counterexamples gener- 
ated by predicate abstraction in software verification. When the counterexample 
is shown to be spurious, the proof is ‘mined’ for new predicates to use in pred- 
icate abstraction. Although similar in spirit, this approach differs significantly 
from ours in both the domain of application and the techniques used. 

Our framework also bears certain similarities with automated counterexam- 
ple-guided abstraction refinement [20]. Chauhan et al. [10], for example, use a 
SAT solver to derive an abstraction sufficient to refute a given abstract coun- 
terexample in model checking. The successive abstractions they obtain, however, 
are cumulative, in that state variables made visible at some point are never subse- 
quently re-hidden. In contrast, our approach generates a fresh abstraction every 
time. 

2 Preliminaries 

2.1 Boolean Satisfiability 

We begin by recalling some well-known facts concerning (propositional) Boolean 
formulas and Boolean satisfiability. 

Let 61,62 ,... be Boolean variables. A literal is either a bi or its negation. 
A (Boolean) clause is a disjunction of zero or more literals — by convention, the 
empty clause is equivalent to False. 

Let 0 be a Boolean formula with free variables 61 , . . . ,6„. It is possible to 
manufacture a Boolean formula cnf(i^) with free variables 6 i,..., 6„+p (where 
the 6 „+i, . . . , 6„_|_p are fresh Boolean variables), such that 

— cnf((^) is in conjunctive normal form (CNF): cnf((^) = Ajli where each 
Bj is a Boolean clause, 

— cnf(</)) is satisfiable iff (f) is satisfiable; more precisely, 36„+i, . . . , 6„+p.cnf((/)) 
is tautologically equivalent to (j), and 

— The number of variables and the size of cnf((()) are both linear in the size of 
(j). 

Linear-time algorithms for computing cnf(^) are well-known; see, for instance, 
[26]. 

A SAT solver is an algorithm which determines, given a Boolean formula (p 
in CNF, whether p is satisfiable. In the affirmative case, the SAT solver will 
produce a satisfying assignment for <p. If, on the other hand, (f> is unsatisfiable, 
the SAT solver can be required to produce a proof of unsatisfiability [33,23]. 
Such a proof in turns yields an unsatisfiable core, i.e., an unsatisfiable subset of 
clauses of 4>. In practice, SAT solvers tend to generate small unsatisfiable cores. 
The SAT solvers we have used in our experiments are zChaff [32] and SMVSAT. 

2.2 Presburger Arithmetic 

Preshurger arithmetic can be defined as the first-order theory of the structure 
(N, 0, 1, -k), where N denotes the set of non-negative integers. In this paper, 

we focus on the quantifier-free fragment of Presburger arithmetic. 
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More precisely, let x\^X 2 , ■ ■ ■ be variables ranging over non-negative integers. 
A linear constraint is any expression of the form 

n 

a-iXi ~ c , 

i=l 

where each and c are integer constants (i.e., Qi,c G Z), and ~ is a comparison 
operator (~ G Quantifier-free Presburger formula (henceforth 

QFP formulas) are Boolean combinations of linear constraints: 

Definition 1. The collection of QFP formulas is defined inductively as follows: 

— Any linear constraint is a QFP formula, and 

— If 4>i and (j )2 are QFP formulas, then so are 4>i A </> 2 , and fix V 412 - 

Remark 1. Note that an integer variable x can easily be represented in Pres- 
burger arithmetic by two non-negative variables: x = x+ — x-. It is equally 
straightforward to encode Boolean variables as equalities of the form x = 1, 
together with constraints of the form a; ^ 1 conjoined at the outermost level. 

We are interested in the satisfaction problem for QFP formulas: given a QFP 
formula 4: is there an assignment of non-negative integers to the variables of 4 
under which 4 evaluates to True! 

Ganesh et al. [14] present a comprehensive survey of decision procedures 
for satisfiability solving of QFP formulas. The abstraction-based algorithm we 
describe in Section 3 can be used in conjunction with any such decision proce- 
dure; it is however desirable that the procedure also generate satisfying integer 
assignments. We briefly describe in the next two paragraphs how we use the com- 
mercial package CPLEX [17], together with the SAT solver zChaff, to achieve 
this^. Put succinctly, our decision procedure iteratively refines Boolean encod- 
ings of the QFP formula based on satisfying assignments from zChaff that are 
inconsistent with the linear arithmetic. This technique is known as ‘lazy explica- 
tion of axioms’, and was originally proposed by the authors of the tools ICS [16, 
12], CVC [11,4,6], Math-SAT [22,3], and Verifun [13]. 

CPLEX uses integer linear programming techniques (and in particular the 
simplex algorithm) to decide whether a conjunction of linear arithmetic con- 
straints is satisfiable. When such a conjunction is satisfiable, CPLEX also gen- 
erates a satisfying integer assignment. 

Our CPLEX-based QFP solver is implemented as follows. Given a QFP for- 
mula 4i we first extract a ‘Boolean skeleton’ 4booi from (/>, by simply replacing 
each linear constraint in 4 with a fresh Boolean variable. We then invoke zChaff 
to determine whether 4booi is satisfiable. If it is not, then 4 cannot possibly be 
satisfiable either, and we terminate. Otherwise, we take the satisfying assignment 
(say b) provided by zChaff and form a corresponding conjunction of linear con- 
straints, which we then submit to CPLEX. If CPLEX is able to find a satisfying 

^ We also experimented with the tool LP_SOLVE [21] but encountered difficulties with 
certain formulas, for which LP_SOLVE appears to be unsound. 
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assignment for this conjunction of linear constraints, then this assignment also 
satisfies (j), and we are done. Otherwise, we augment the Boolean formula (j)booi 
with a ‘blocking’ clause ruling out the Boolean assignment b produced earlier by 
zChaff. We then repeat the procedure with the new Boolean formula until the 
satisfiability of (f) is established or refuted. 

In the remainder of this paper, let us refer to our implementation of the 
above algorithm as ‘PresSolver’. 

Remark 2. Among other existing QFP solvers, let us mention (i) the OMEGA 
tool^ [28] , which converts a QFP formula into disjunctive normal form and then 
applies an extension of the Fourier-Motzkin linear programming algorithm on 
each disjunct; (ii) the automata-based tool LASH [31]; and (iii) the previously 
mentioned ILP-based tools LP_SOLVE, ICS, and CVC^. 

Our abstraction-based framework can be used in conjunction with any of 
these decision procedures, and indeed we intend to carry out a number of ex- 
periments with them in the near future. We have so far mostly worked with the 
commercial package CPLEX mainly because of its high reliability, completeness 
over the integers, and efficiency. While ICS and CVC are not at present complete 
over the integers, and are therefore unsuitable for our purposes, their implemen- 
tors inform us that they are planning to release complete versions of these tools 
in the near future. 

3 Abstraction-Based Presburger Satisfiability Solving 

We now present the main contribution of this paper, a SAT-based algorithm that 
generates increasingly precise abstractions of QFP formulas. Our abstractions are 
obtained by eliminating linear constraints from QFP formulas in a conservative 
manner. The choice of which constraints to eliminate is guided by an iterative 
interaction with a SAT solver. 

Let (/) be a QFP formula. If we view the linear constraints occurring in (j) as 
atomic propositions, we can convert </> into a satisfaction-equivalent QFP formula 
4>' in CNF by invoking the procedure described in Section 2.1. Note that (j>' ni^iy 
require the introduction of fresh Boolean variables; as discussed in Remark 1, 
these are modeled as new constrained integer variables. For the remainder of this 
section, let us therefore assume without loss of generality that the QFP formula 
<j) is given to us in CNF. 

Write (j) = A^i Q; "''’ith each Cj a (Presburger) clause. (A Presburger clause 
is a disjunction of linear constraints.) Let xi, . . . , be the collection of variables 
appearing in (j). Suppose we are given a function size which assigns to each 
variable Xi a positive integer size{xi). Intuitively, size{xi) denotes the maximum 
number of bits allowed in the binary representation of Xi] put another way, 
size{xi) implicitly represents the constraint Xi < 

® In fact, the OMEGA tool handles the full first-order theory of Presburger arithmetic, 
not just its quantifier-free fragment. 

^ We note that while much of the work on ILP has in fact focused on 0-1 ILP (see, 
e.g., [5]), the latter is less useful here because of the presence of unbounded variables. 
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Let stand for the formula Cjf\/\^^j. Xi < where Jj is the set of 

indices of the variables that appear in Cj . We can encode as an equivalent 
Boolean formula bool(C®*^®), as follows. For each variable Xi appearing in Cj, 
allocate size{xi) Boolean variables, one for each of the bits allowed in the binary 
representation of Xi. The linear constraints in Cj are then encoded as Boolean 
formulas on these Boolean variables. Note that the encoding uses exact (i.e., 
arbitrary-precision) bit-vector arithmetic. 

Next, let = cnf(bool((7|®^®)) denote the CNF representation of this 

Boolean formula, ensuring in the process that all newly introduced auxiliary 
Boolean variables are fresh. Write 

m m 

^ ^ (JS^ze = f\ . 

i=i i=i 

(Here cbe stands for ‘CNF Boolean Encoding’.) The two main points are: 

— The QFP formula is satisfiable iff the Boolean formula cbe(i^®®^®) is 

satisfiable, and 

— Any satisfying assignment for is also a satisfying assignment for <j). 

Observe that since each is a conjunction of Boolean clauses, cbe(^®*^®) 
is itself in CNF; let us therefore write cbe((^’’®^®) = Afc=i^fe- Note that it is 
straightforward when building cbe((/)^*‘‘’®) to maintain a table recording, for each 
clause Ak of cbe((()^*^®), its ‘origin’ orig(Afc) = Cj in <l>. While it is possible for 
several Presburger clauses to yield a common Boolean clause A^, orig(Afc) only 
records one of them. It is clear that, whenever orig(Afe) = Cj, any satisfying in- 
teger assignment for Cp^^ yields a corresponding satisfying Boolean assignment 
for Ak- 

We now come to the crux of the paper, our abstraction-based algorithm for 
solving QFP formulas, which we call ‘ASAP’ (Abstraction-based Satisfiability 
Algorithm for Presburger). ASAP takes as input a QFP formula 4> in CNF 
with free variables x\, . . . ,Xn, and either outputs a satisfying assignment of non- 
negative integers to these variables, or declares <j) to be unsatisfiable. ASAP 
repeatedly invokes as subroutines both a SAT solver (SMVSAT) and a quantifier- 
free Presburger arithmetic decision procedure (PresSolver, which itself is based 
on CPLEX and zChaff). 

ASAP first attempts to satisfy an over-constrained version of <j) in which 
the integer variables are only allowed to range over a bounded domain. This is 
achieved by encoding the over-constrained QFP formula as a Boolean formula, 
which is then given as input to SMVSAT. If a satisfiable assignment is found, 
then (f) is clearly also satisfiable, and an integer witness is easily extracted from 
the satisfying Boolean assignment produced by SMVSAT. Otherwise, SMVSAT 
returns an unsatisfiable core, which is in turn used to pinpoint a subset of the 
clauses of </> as unsatisfiable over the chosen bounded domain. The conjunction of 
these clauses is clearly a conservative abstraction of (j), in that if it is unsatisfiable 
then so is </>. We therefore run PresSolver on this abstracted QFP formula. If 
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it is found to be satisfiable, we increase the size of our bounded domain to 
accommodate the satisfying assignment supplied by PresSolver, and repeat the 
whole process. We continue until a conclusive judgment on the satisfiability of 
(j) is obtained. 

Section 4 presents experimental evidence which suggests that ASAP is con- 
siderably more efficient than PresSolver on its own. ASAP is described in pseudo- 
code in Figure 1. 

Algorithm ASAP 

Input: QFP formula (j> in CNF with free variables 
Output: satisfying assignment for (j) or ‘UNSAT’ 

let sizeixi) = 1 for each i 
repeat forever 

run SMVSAT on cbe(<^““‘‘) 
if cbe(0®“®) is satisfiable then 

returnCcorresponding satisfying assignment for 
else 

let ^ unsatisf iable core of cbe((^‘’*^'‘) 

let i> = /\keK°"S{Ak) 
run PresSolver on ip 
if Ip is unsatisf iable then 
return ( ‘ UNSAT ’ ) 
else 

let Vi, . . . ,Vn be a satisfying assignment for ip 
let size{xi) = max{\\og 2 {vi)], size{xi)) for each i 

endrepeat 

Fig. 1. ASAP: an abstraction-based Presburger satisfiability solving algorithm 

Theorem 1. The algorithm ASAP described in Figure 1 is correct and always 
terminates. 

Proof. We first examine the issue of correctness. Observe that ip is less con- 
strained a QFP formula than (p, in that it contains only a subset of the clauses 
of (p. Thus if Ip is unsatisfiable, then so is <p. On the other hand, suppose that 
ASAP terminates with an assignment v of non-negative integers to the variables, 
u is a satisfying assignment for for some instance of size. But since any 

satisfying assignment for is automatically a satisfying assignment for (p, v 
is indeed a satisfying assignment for <p, as required. 

We now claim that, in any execution of ASAP, we never see two identical 
instances of the QFP formula ip. Since ip is always a conjunction of a subset of the 
clauses of <p, it only has finitely many possible instantiations, which immediately 
entails the termination of the algorithm. 

It remains to establish our claim. Suppose, on the contrary, that two identical 
instances of ip are observed in a given execution of ASAP. The first time around, 
an unsatisfiable core Afceic of cbe((/)‘‘®^®) is obtained, and the function size 
is then subsequently increased to size' to accommodate a satisfying integer as- 
signment V for Ip. In other words, ip = Afceic orig(Afe) and [U] evaluates to 
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True. (Here denotes the QFP formula ip conjoined with linear constraints 
of the form Xi < .) 

Some iterations later, we encounter a second unsatisfiable core of 

cbe((/)^®®^®") such that orig(A/). Writing C'j(;)=orig(A/), we have that 

^T{l) evaluates to true for every I G L, since rJ is a satisfying assignment for 

^ each iteration of the repeat loop increases (pointwise) the function 

size, we conclude that size" ^ size', and therefore that [U] also evaluates 

to true for every I G L. Let b be the Boolean assignment to the bit-variables 
prescribed by size" corresponding to the integer assignment v. We immediately 
get that Ai\b] evaluates to True for each I G L, and therefore that /\i^j^Ai is 
satisfiable, contradicting our earlier hypothesis. □ 

Remark 3. We record the following observations concerning ASAP: 

— Any ILP-based solver, such as PresSolver, offers the option of generating 

satisfying integer assignments that moreover minimize some linear ‘objec- 
tive function’ f{x\, . . . ,Xn). In the case at hand, it is desirable that sat- 
isfying assignments be as compact as possible; more precisely, they should 
ideally minimize the number of new bits that are required for their repre- 
sentation. A simple linear function which approximates this requirement is 
f{x\, . . . ,Xn) = X^r=i function that PresSolver 

uses. Note that while minimizing the number of bits subsequently leads to 
easier queries for the SAT solver, the minimization requirement is an addi- 
tional burden for the ILP-based Presburger solver. 

— Note that, while our inner Presburger decision procedure PresSolver gen- 
erates satisfying integer assignments, many theorem proving tools do not. 
Nonetheless, we could still use a pure decision procedure in ASAP by simply 
requiring, on every iteration, that the function size be increased by 1 for 
each of its arguments. 

— An examination of the proof of termination reveals that the number of itera- 
tions of the main loop of our algorithm is bounded by the number of different 
subsets of clauses of the original Presburger formula (j), a quantity which is 
exponential in the size of (j). One can in fact guarantee at most a polynomial 
number of iterations, by invoking the fact that, if (j) admits a solution at all, 
4> admits a solution in which each variable is assigned a value that can be 
represented with polynomially many bits [7, 19]. Since each iteration of the 
main loop increases the number of bits used by at least one, at most poly- 
nomially many iterations are required to reach this bound. Note nonetheless 
that since a SAT solver is invoked each time, the overall worst-case time 
complexity of ASAP remains exponential. 

4 Implementation and Experimental Results 

We implemented our tool ASAP within the UCLID verification system [30], 
which is implemented in Moscow ML [24], a dialect of Standard ML. In imple- 
menting ASAP, we used PresSolver as a decision procedure for QFP formulas. 
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and SMVSAT as a proof-generating SAT solver. SMVSAT outputs a proof as a 
set of resolution steps. The set of all original (i.e., not introduced by resolution) 
clauses that appear in this proof constitute the unsatisfiable core. ASAP inter- 
acts with PresSolver and SMVSAT using a file-based interface. The total running 
time for ASAP is the cumulative time spent in generating input for SMVSAT 
and PresSolver, in running SMVSAT and PresSolver, and in analyzing their 
output. 

We performed an experimental evaluation to investigate whether using Pres- 
Solver within ASAP could achieve a significant speed-up over directly using 
PresSolver on the input formula. We used two benchmark sets of QFP formu- 
las in CNF : randomly generated formulas, and formulas generated in real-world 
software verification problems. 

The experiments were performed on a Linux workstation with an AMD 
Athlon 1.5 GHz dual-processor CPU, with 3 GB of RAM. Both ASAP and 
PresSolver are single-threaded. 

4.1 Results on Random Benchmarks 

We ran both ASAP and PresSolver on a set of 45 randomly generated formulas 
with a timeout of 1200 seconds. The formulas included both unsatisfiable and 
satisfiable instances. We generated the formulas recursively as follows: for each 
node, we randomly select either a boolean operator (A, V, or a relational 
operator (=, <, etc.). In case of a relation, we generate a linear constraint, ran- 
domly selecting the coefficients of the variables and the constant term from the 
range [0, 100]. The number of variables is fixed, but the number of linear con- 
straints can vary, allowing us to generate over-constrained formulas that have a 
reasonable likelihood of being unsatisfiable. The depth of nesting of Boolean op- 
erators in the formula is bounded, eventually forcing the selection of a relational 
operator. 

Figure 2 compares, for each formula, the total run-time of ASAP with the 
run-time of PresSolver. In the plot, the x-coordinate of each point is the time 
taken by ASAP, and the y-coordinate is the time taken by PresSolver. We also 
plot the diagonal line y = x: points above the diagonal correspond to benchmarks 
on which ASAP outperforms PresSolver, while points below it correspond to 
benchmarks on which ASAP is outperformed. 

The results show that ASAP outperforms PresSolver on most of the bench- 
marks, completing on all benchmarks within a minute while PresSolver times 
out on 6 benchmarks. On larger benchmarks for which PresSolver terminates, 
we notice that ASAP performs an order of magnitude better; the speed-up is 
more than a factor of 100 on some benchmarks. PresSolver outperforms ASAP 
on some smaller formulas, but ASAP completes within 4 seconds on all of these; 
the reason for PresSolver’s superior performance on these is simply because the 
original formulas themselves are fairly small (about 80 clauses), so that ASAP’s 
extra overhead is comparatively more costly. 

We also investigated how the maximum size of any abstracted formula ip, 
measured in terms of number of CNF clauses, compares with that of the original 
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Total Time for ASAP (sec.) 

Fig. 2. Comparing ASAP against PresSolver on random benchmarks. The 

timeout was 1200 seconds. Note the log scale on both axes 



formula (f). This was done by computing the ratio of the number of clauses in r/> 
with the number of clauses in (p. We found the smallest such ratio to be 0.009, 
where the original formulas has 648 clauses and the largest abstraction contains 
just 6. The largest ratio was 0.206, with 354 clauses in the original formula, and 
73 in the largest abstraction. This reduction is the main reason for the speed-ups 
achieved over directly using PresSolver on 4>. 

4.2 Results on Software Verification Benchmarks 

We now report on the second set of experiments performed on formulas gen- 
erated from software verification. We used a suite of formulas generated in the 
WiSA project® in checking for format string vulnerabilities. The benchmarks in- 
clude 20 formulas, both satisfiable and unsatisfiable, in an extension of QFP with 
uninterpreted functions. Uninterpreted functions were first eliminated using Ack- 
ermann’s technique [1], and both ASAP and PresSolver were run on the resulting 
QFP formula. Each generated QFP formula is an arbitrary Boolean combination 
of linear constraints. The number of variables in the formulas ranges from 33 to 
43, the number of linear constraints ranges between 197 and 267, and the total 
number of clauses ranges between 695 and 1054. 

PresSolver was unable to solve any of these formulas within a timeout of one 
hour. On the other hand, ASAP was able to complete on all but one benchmark 

® http://www.cs.wisc.edu/wisa 
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Table 1. ASAP results on WiSA benchmarks. A indicates that ASAP timed 
out after 1 hour. (PresSolver timed out on all instances.) 



Benchmark 


1 ASAP Time 


Max. Ratio 
#(■)/) clauses) 
#((^ clauses) 


PresSolver Time 
(sec.) 


Total Time 
(sec.) 


s-5-3 


199.70 


230.44 


0.139 


s-5-5 


239.48 


262.00 


0.142 


s-5-6 


193.53 


215.56 


0.148 


s-5-7 


274.08 


303.45 


0.142 


s-6-5 


286.88 


310.05 


0.125 


s-6-6 


261.93 


292.05 


0.132 


xs-5-2 


408.56 


434.16 


0.120 


xs-5-3 


596.82 


631.77 


0.122 


xs-5-5 


970.01 


994.11 


0.124 


xs-5-6 


934.18 


959.76 


0.126 


xs-5-7 


978.83 


1008.53 


0.124 


xs-5-9 


126.44 


150.90 


0.155 


xs-5-10 


127.93 


155.01 


0.155 


xs-5-11 


1034.01 


1062.88 


0.161 


xs-6-2 


648.44 


683.79 


0.103 


xs-6-5 


1051.57 


1116.78 


0.106 


xs-6-6 


1008.85 


1043.09 


0.110 


xs-6-9 


132.04 


158.97 


0.135 


xs-6-10 


184.51 


220.18 


0.135 


xs-6-11 


* 


* 


- 



within the timeout. Table 1 shows the results of running ASAP on the WiSA 
benchmarks. We give the total time taken by ASAP and the time taken by 
PresSolver on abstractions generated by ASAP, summed over all invocations of 
PresSolver. We notice that on all benchmarks, the time taken by PresSolver is 
the bottleneck for ASAP, accounting for over 90% of the total time on most 
benchmarks. ASAP takes on the order of a few minutes to solve each formula. 
SMVSAT took less than 5 seconds on each occasion, with the remaining time 
spent in generating encodings and parsing output from SMVSAT and PresSolver. 

The abstractions generated by ASAP were fairly compact. The last column 
in Table 1 shows the ratio of the number of clauses in the largest abstraction ip 
generated by ASAP to the number of clauses in the original formula (p. We see 
that this ratio is roughly between 10 and 15 percent. Again, the compactness of 
the abstraction is the main reason why ASAP is able to solve these formulas, 
while PresSolver is unable to complete on any. Finally, we note that the number 
of iterations taken by ASAP ranged between 19 and 28. 

5 Conclusion 



We have presented a novel abstraction-based approach to the satisfiability solv- 
ing of quantifier-free Presburger formulas. Our experimental results, over both 
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random formulas as well as industrial benchmarks, indicate that embedding a 
theorem prover for QFP formulas within our framework can achieve significant 
speed-ups over directly using the prover on the input formula. 

In the future, we would like to experiment with a number of other Pres- 
burger solvers, and in particular ICS and CVC. While these tools are at present 
incomplete over the integers, and hence not suitable for our purposes, their im- 
plementors inform us that new releases will remedy this, something we look 
forward to. 

Another research direction would be to investigate whether the abstraction- 
based methodology presented here could be applied to other (higher-order?) 
logics and theories, possibly over different domains such as bit vectors, real 
numbers, etc. 
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Abstract. Model checking of infinite state systems is undecidable, 
therefore, there are instances for which fixpoint computations used in 
infinite state model checkers do not converge. Given a widening operator 
one can compute an upper approximation of a least fixpoint in finite 
number of steps even if the least fixpoint is uncomputable. We present 
a widening operator for automata encoding integer sets. We show how 
widening can be used to verify safety properties that cannot be verified 
otherwise. We also show that the dual of the widening operator can be 
used to detect counter examples for liveness properties. Finally, we show 
experimentally how the same technique can be used to verify properties 
of complex infinite state systems efficiently. 



1 Introduction 

Symbolic verification of large and complex infinite state systems may require 
an unreasonable number of fixpoint iterations. Furthermore, since the problem 
of verification of temporal properties of infinite state systems is in general un- 
decidable, the fixpoint computations might not converge at all. To overcome 
this problem one can use approximations. Abstract interpretation framework [9] 
provides a technique known as widening, to compute a least fixpoint’s upper 
bound in finite time. Widening has been successfully applied to Polyhedra based 
verification of systems specified with arithmetic constraints [10,13,7]. On the 
other hand, similar work for the automata encoding of arithmetic constraints 
has been limited. We present a widening operator for automata encoding of in- 
teger sets as described in [3]. We also show how to verify properties of infinite 
state systems using an approximate fixpoint computation based on our widening 
technique. Note that, for these properties the exact fixpoint computation does 
not converge. Finally, we show experimentally how the same technique can be 
used to improve the efficiency of our infinite state model checking tool. Action 
Language Verifier (ALV) [8], and compare its performance with BRAIN [15]. 

Most reachability properties can be formulated as least fixpoints over sets 
of states. If the state space is infinite, these fixpoints may not be computable. 
Widening is a well known technique [9, 10] that facilitates the convergence of a 
fixpoint computation by extrapolating an upper approximation of the exact least 

* This work is supported by NSF grants CCR-9984822 and CCR-0341365. 



R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 321-333, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 




322 



Constantines Bartzis and Tevfik Bultan 



fixpoint. In [10, 12] a widening operator was defined for systems whose transition 
relation and sets of states can be described by linear arithmetic constraints, 
symbolically represented as sets of convex Polyhedra. This technique has been 
successfully used in the analysis of various types of systems such as concurrent 
systems (by extending it to Presburger arithmetic formulas [7]), synchronous 
programs and linear hybrid systems [13]. 

Systems described by Presburger arithmetic formulas can also be symbol- 
ically represented by finite automata [6,16]. Experimental results in [3], indi- 
cate that the automata representation often outperforms the polyhedral rep- 
resentation. However, until lately the use of approximation techniques for the 
automata representation has been limited. In a series of recent papers [14, 5] 
“Regular Model Checking” (RMC) has been defined as a framework for algo- 
rithmic verification of systems with transition relations represented by a regular 
length-preserving relation on strings. Typical examples of such systems are lin- 
ear parameterized networks of processes. Widening techniques have been used 
to compute the set of reachable configurations of such systems in finite time. 
However, the arithmetic relations considered in [14, 5] are restricted because of 
the unary encoding used (only addition of constants is allowed). Our goal is to 
develop a widening technique for automata representing Presburger formulas. 

Another way to deal with non-termination of exact infinite state model check- 
ing is to compute the reflexive-transitive closure R* of the transition relation R 
of the system (or an upper approximation of R*). Given R*, one can compute the 
set of states forward or backward reachable from an initial set of states I with 
a single image computation. Therefore computing R* is at least as hard as com- 
puting sets of reachable states. In fact, it may be the case that the set of states 
reachable from I is regular but R* is not. Nevertheless, ways to approximate R* 
for regular transition relations R have been studied in [5, 11,4]. In particular, in 
[4] the authors present a generic technique for computing R* (sometimes pre- 
cisely) for transition relations representing arithmetic relations. The technique 
is generic in the sense that it can handle relations that are not in a restricted 
form. Our technique is also generic but is based on widening instead of iterating 
relations. As we show in Section 4 our approach can verify some properties of 
systems with non-regular R*. The algorithm in [4] is complicated and involves 
determinization of automata which is potentially an expensive operation even 
when heuristics are used to improve efficiency. There is no full implementation of 
this algorithm that allows it to be used for verification applications. As we show 
in Section 5, our widening technique can be used to verify properties of com- 
plex systems efficiently. Finally, note that one can use our widening technique 
to iterate transducers but the opposite is not true. 

The rest of the paper is organized as follows. First we discuss fixpoint compu- 
tations and how widening technique can be used to help them converge in Section 
2. Next, we briefly present the automata representation we are using for integer 
sets satisfying Presburger formulas in Section 3. In Section 4 we formally define 
our widening operator for arithmetic automata, prove some interesting proper- 
ties and illustrate how it can be applied to a set of characteristic systems. Finally 
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in Section 5 we present experimental results that demonstrate how our widening 
technique can be used to verify properties of complex systems efficiently. 

2 Fixpoint Computations and Widening 

We consider systems whose states can be described by the values of v inte- 
ger variables x\,...,Xy. A set of states of the system, S' C Z“, is a relation 
on the V integer variables. The transition relation of the system, R C is 
a relation on the current state and next state variables x\, . . . ,Xy,x'i, . . . ,x'^. 
In particular, we consider systems where S and R can be represented as Pres- 
burger arithmetic formulas, i.e., S = {{x\, . . . ,Xv) \ . . . , a;„)} and R = 

{(xi, . . . . . . ,x'„) | . . ,,Xy,x'-^, . . • where (j)s and (j)R are Pres- 

burger arithmetic formulas. In Section 3, we show how to represent S and R 
symbolically using finite automata. 

We define the pre-condition of S with respect to R, pre{S,R) C Z", as the 
set of states that can reach some state in S in one step. Similarly we define 
post{S, R) C Z*' as the set of states reachable from some state in S in one step. 
One can compute pre{S, R) and post{S, R) as follows 

pvci^S^R^ { (xi , . . . , x^,) I — x'^] ^ 4 ^r) 

post{S,R) = {{xi,...,Xy) I {3xi..3xy.(j)s ^ (pR)[xi^xi,...,x'^^x^]}- 

where is the formula generated by substituting z for y in ip. Hence, to 

compute pre{S, R) and post{S, R) we need to be able to compute three oper- 
ations: conjunction, existential variable elimination and renaming. In Section 
3, we will show that these operations can be implemented using an automata 
representation for R and S. 

We can formulate the verification problem of invariants based on pre- and 
post-condition functions as follows. We are given a set of initial states I, a 
transition relation R, pre- and post-condition functions pre{S, R) and post{S, R), 
and a property P. To verify the property we have two alternatives. The first is 
to compute FR{I), the set of states forward reachable from the initial states I, 
and then check whether 

FR{I) C P (1) 

The second way is to compute BR{^P) the set of states backward reachable 
from the negation of the property and then check that 

BR{^P) n / = 0 (2) 

Since the problem is undecidable, we might not be able to compute FR{I) or 
BR{^P) exactly. In that case we can follow a conservative approach by replacing 
FR{I) and BR{-^P) by over approximations FR(I)'^ 3 and BR{-^P)'^ 3 

BR{^P) in equations 1 and 2 respectively. Note that, we may not be able to 
verify a property that actually holds when we use approximations. Below we 
describe how to compute FR{I) and Fi?(J) + . Computation of BR{^P) and 
BR{-^P)'^ is similar. 
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The set FR{I) of reachable states from I is the least fixpoint of the functional 
XX . I U post{X, R). This fixpoint is the limit of the sequence So, Si ,. . where 
So = I and S'i+i = Si U post{Si, R). This sequence may not converge. However 
we can compute an over approximation FR{I)~^ which is the limit of a new 
sequence So,S [,. . such that for each i. Si C S'' and the sequence Sg, S(, . . . 
converges after a finite number of iterations. We compute the S's by using a 
widening operator V, which satisfies the following property: 

Given two sets A and B, AU B C AVB. (3) 

Now we can define S' as: 

^ ( Si if 0 < z < s 

" I 5'_iV(S'_i Upost{S'_„R)) if z > s 

where s is the seed of the widening sequence. Experiments show that higher 
seeds are likely to result in better approximations. The goal is to find a widening 
operator such that the sequence Sq,S[,... converges as fast as possible to a 
fixpoint that is as close as possible to the exact set of reachable states. We 
present a widening operator for automata representing integer sets. 



3 Automata Representation for Integer Sets 



The representation of Presburger formulas by finite automata has been stud- 
ied in [6,16,3]- Here we briefly describe finite automata that accept the set of 
natural number tuples that satisfy a Presburger arithmetic formula on v vari- 
ables. The representation we discuss below can be extended to integers using 2’s 
complement arithmetic [3]. We present the construction for natural numbers to 
simplify the presentation. Our implementation of widening technique and our 
verification tool also handles negative integers. 

We encode numbers using their binary representation. A z;-tuple of natural 
numbers (rzi, zz 2 , ..., zz„) is encoded as a word over the alphabet {0,1}“, where 
the ith letter in the word is {bn, bi 2 , ..., biv) and bij is the ith least significant bit 
of number nj. Given a Presburger formula <j), we construct a finite automaton 
FA{(j))={K,X,6,e,F) that accepts the language L((/)) over the alphabet X = 
{0, 1}“, which contains all the encodings of the natural number tuples that satisfy 
the formula. K is the set of automaton states, X is the input alphabet, 6 : 
K X X ^ K is the transition function, e G K is the initial state, and F C K is 
the set of final or accepting states. 

For equalities, ai ■ Xi = c) = {K, X, S, e, F), where 



K = {k \ tti < k < QiV 0 < k < — c V — c < fc < 0} U {sink}, 

ai<0 ai>0 

A = {0,1}“, e = -c, F={0}, 



6{k,{bi, ...,bv)) = 



{k + '^a^■ bi)/2 

i=l 

sink 



if k + X]r=i ■ ki is even and k yf sink 
otherwise 
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For inequalities, FA(^“^^ Ui ■ Xi < c) = {K, S, S, e, F), where 

K = {k \ ai < k < tti V 0 < fc < — c V — c < /c < 0} 

aj<0 ai>0 

S = {0, 1}*', e = — c, F={k\k£K/\k< 0}, 

V 

6{k, (6i, by)) = [{k + '^a^■ bi)/2\ . 

i=l 

Moreover, conjunction, disjunction and negation of constraints can be im- 
plemented by automata intersection, union and complementation, respectively. 
Finally, if some variable is existentially quantified, we can compute a non- 
deterministic FA accepting the projection of the initial FA on the remaining 
variables and then determinize it. The resulting FA may not accept all satis- 
fying encodings (with any number of leading zeros). We can overcome this by 
recursively identifying all rejecting states k such that 6{k, (0, 0, ..., 0)) G F, and 
make them accepting. Universal quantification can be similarly implemented by 
the use of the FA complementation. 

4 Widening Arithmetic Automata 

Before formally defining a widening operator for arithmetic automata we briefly 
describe the intuition behind it. Let A and A! be two automata representing two 
consecutive members of a sequence Aq, A\, . . whose limit Aqo is the exact least 
flxpoint we are trying to compute. Since A^o is the union of all A^s, it can be seen 
as a product automaton with each state being a tuple (of possibly infinite size) 
containing a state from each Ai. First, consider a string w and assume that after 
consuming w A, A' and A^o move to state fc, k' and koo respectively. Then koo 
contains k and k'. Second, consider states k and k' of A and A' respectively, such 
that the languages accepted from k and k' are the same. Then again there exists 
a state fcoo of A^o that contains both k and k' . For either scenario, our widening 
method, given A and A! as an input, produces an automaton that accepts both 
languages of A and A! and in which k and k' are merged in a single state. 

Given two finite automata A = (AT, E, 6, e, F) and A! = (AT', U, i5', e', F') we 
define the binary relation =v on AT U AT' as follows. Given k G K and k' G AT', 
we say that k =v k' and k' =v k if and only if 

yw G S*. 6*{k,w) G F S'*{k',w) G F' (4) 

or k,k' sink A3w G E* . S* (e,w) = k A S'*{e' ,w) = k' , (5) 

where S*{k, w) is defined as the state A reaches after consuming w starting from 
state k. In other words, condition 4 states that k =v k' if the languages accepted 
by A from k and by A' from k' are the same. Gondition 5 states that k ='^ k' if 
for some word w, A ends up in state k and A' ends up in state k' after consuming 
w. For k\ G K and ^2 G AT we say that k\ =v ^2 if and only if 

3k' G AT'. ki =v k' A ^2 =v k' V 3k G K. k\ =v k Ak 2 =v k (6) 
Similarly we can deflne fcj =v k '2 for G K' and k '2 G AT'. 
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It is easy to prove that =v is an equivalence relation. Call C the set of 
equivalence classes of =v- We define AVA' = {K” , S, S", e", F") by: 

K" = C 

5"{ci, a) = Cj s.t. (Vfc G CiC\ K. 5{k, a) G Cj V S{k, a) = sink) A 
(Vfc' G CiD K' . 5'{k' , a) G Cj V S'{k' , a) = sink) 
e" = c s.t. e G c f\e! G c 

F" = {ci, C 2 , ..., c„} s.t. 'ici^Ak G F F' . k G Ci 

In other words, the set of states of AV A' is the set C of equivalence classes of 
=v- Transitions are defined from the transitions of A and A' . The initial state 
is the class containing the initial states e and e' . The set of final states is the 
set of classes that contain some of the final states in F and F' . The following 
Theorem states that V satisfies condition (3). 

Definition 1. Given an automaton A = (K, S,S,e, F) and a state k G K, we 
define L{k) to be the language accepted by the automaton {K,S,6,k,F). Also 
L{A)=L{e). 

Theorem 1. Given two automata A and A! , L{A) U L{A') C L{A^ A'). 

Proof. Essentially we want to prove that given w G S* such that w is accepted 
by A or A' then w is also accepted by AV A' . Without loss of generality we may 
assume that w is accepted by A. Let w = a^ui . . . cr„. Then there is a sequence of 
non-sink states fco, fci, . . . , kn+i such that k^ = e, 6{ki, cfi) = ki+\ and G F. 
From the definition of AVA' = {K” , S, 6", e", F") it follows that there exists a 
sequence cq, ci, . . . , c„+i such that cq = e", S"{ci, ai) = Ci+i and ki G Ci for all 
0 < t < n -I- 1. Since kn+i G c„+i and G F it follows that c„+i G F" and 
thus w is accepted by A\7 A' . 

According to the original definition [9] , a widening operator has to guarantee 
convergence. Our widening operator does not guarantee convergence. Neverthe- 
less, we can force it to converge by a slight modification. If we discard the con- 
dition k, k' yf sink from equation 5, for each state in one automaton there exists 
an equivalent state in the other. Thus, the produced automaton has at most as 
many states as the smaller operand. As a result, the automata in the sequence 
can not increase in size. There are finitely many automata with a given number 
of states and a fixed alphabet. On the other hand, the size of the set of states 
represented by the automata in the sequence is monotonically increasing, other- 
wise we would have reached a fixpoint. Consequently, the sequence will converge. 
One can deploy this technique when the number of iterations has become too 
high. However, we decided not to use it in our implementation (see Section 5), 
because in most of the cases this modification in the widening operator makes 
the approximation too coarse to prove any property. 

On the other hand, in the sequel we show that for a class of systems, if 
the approximate computation converges, it computes the exact set of reachable 
states. 
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Definition 2 . An automaton Ai = {K\, S, (5i, ei, Fi) is called weakly equivalent 
to automaton A2 = {K2, S, 62,02, F2) iff there exists a total function f : Ki \ 
{sink} K2 such that Si{k,a) = sink or f{Si{k,a)) = 52{f{k),a) for all 
k G Ki \ {sink} and a G S. Furthermore, /(ei) = 62 and for all k\ G F\, 
f{ki)GF2. 

Lemma 1. If automaton Ai = {Ki, E,Si,ei, Fi) is weakly equivalent to au- 
tomaton A2 = {K2, S, 62, e2, F2) then state ki G Ki\ {sink} is mapped to state 
fe G K2 iff for all w G S* , = k\ ^ Sffe2,w) = ^2. 

Lemma 2 . If automaton A\ = {Ki,E,6i,ei,Fi) is weakly equivalent to au- 
tomaton A2 = {K2, S, 62, e2, F2) then, if state k\ G K\ is mapped to state 
fe G K2 then L(fci) C L{k2). 

The proofs of Lemma 1 and Lemma 2 are trivial and have been omitted. 

Definition 3. An automaton A = (K, S, 6, e, F) is called state- disjoint iff L(ki) 
n L(fc2) = 0 for all ki^k2G K. 

Lemma 3. Consider a transition system and an approximate sequence Sq, 
S{, . . . as defined in Section 2 . If XX . I U post{X, R) has a least fixpoint repre- 
sented by a state- disjoint automaton A^o and the automaton Ai representing S{, 
i > s is weakly equivalent to A^o, then the automaton representing is 
also weakly equivalent to A^o- 

Proof. Let A- represent S'- U post{Sl,R). Then Aj+i = Let At = 

{K,S,6,e,F), A'. = {K',S,S',e',F') and A^ = {K^, E,Soo,eoc, F^). Note 
that by Lemma 2 and the monotonicity of XX . I Llpost{X, R), L{Ai) C L{A'f) C 
L(Aoo). Hence, the states of A^+i are of two kinds: classes of =v that contain 
states from both Ai and A' and singleton classes containing one state from A'. 
Recall that Ai is weakly equivalent to Aq^ with respect to a function /. 

First we show that for any two distinct states ki, k2 of Ai that belong to the 
same class, f{ki) = /(fe). States k\ and are in the same class iff they are 
both =v to some state k' of A'. This can happen in two ways. First, assume that 
there exist wi,W2 G S* such that 6*{e,wi) = ki, S*{e,W2) = ^2, 6'*{e',wi) = k' 
and 6'*{e' ,W2) = k' . Then by Lemma 1, (5J^(eoo,wi) = /(fci) and 5}^{eao,W2) = 
/(fe). Since L{Ai) C L(A') C L(Aoo), 0 ^ L(fci) U L{k2) Q L{k') C L(/(fci)) n 
L{f{k2)). Since A^o is state-disjoint, we conclude that f{ki) = 7(^2) • Second, 
assume that L{k') = L{k2) and for some w G S* , 6*{e,w) = k\ and 5 '*{e' ,w) = 
k'. Using similar arguments we conclude that 0 ^ L{ki) C L{k') = C 

n L(/(fc2)) and therefore f{ki) = f{k2). 

Now we can prove that A' is weakly equivalent to A^o- To do so, we define 
a function f : K' ^ K^o according to Definition 2. Given any state k' G K' 
such that fc' =v k for some k G K (i.e., k' belongs to a non-singleton class), 
we define f'{k') = f{k). Now we can show that all transitions from k' con- 
form to Definition 2. First we consider transitions to states in non-singleton 
classes. If L{k') = L{k) then for all a G S, L{S'{k',a)) = L{S{k,a)). Conse- 
quently, S'{k',a) =v 6{k,a) and thus f'{S'{k',a)) = f{S{k,a)) = 5 oo{f{k),a) = 
^oo{f'{k'), a). Now if L{k') ^ L{k), then there exists w G E* such that 6*{e, w) = 
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k and S'*{e',w) = k' . Then S*{e,w.a) = S{k,a) and 5’*{e' ,w.a) = S'{k',a), thus 
S{k,a) =v 6'{k',a) and again we can conclude that f{5'{k',a)) = 6oo{f{k'),cr). 
Now let us consider transitions going to singleton states, for which /' has not 
been defined yet. Assume S'{k',a) is such a state. We define f'{S'{k',a)) = 
SoD{f'{k'),a). Now we need to prove that there is no state k” G K' that be- 
longs to a non-singleton class and symbol a' G S such that S'{k", a') = 5'{k',a) 
and f {S' {k” , a')) yf Soo{f'{k"),cr'). If that were the case, we can show that 
L{SoD{f'{k'),(j)) would intersect L{Soo{f'{k"),a')), which contradicts the hy- 
pothesis. Proceeding in the same manner we can show that f' can be defined 
for the rest of the states, so that A- is weakly equivalent to A^o- Finally, Ai+i 
is weakly equivalent to A^o, since it is constructed by merging states of A' that 
have the same /'. 

Theorem 2. Consider a transition system and an approximate sequence 
as defined in Section 2. If XX . I U post{X, R) has a least fixpoint 
represented by a state- disjoint automaton A^o and the automaton Ag represent- 
ing S'g is weakly equivalent to A^o, then if the sequence converges, it will converge 
to the exact least fixpoint. 

Proof. Automaton Ag, i.e., the last automaton that has been computed without 
widening, is weakly equivalent to A^q. By Lemma 3, all Aj, i > s, are also 
weakly equivalent to A^o. Since L{Aq) C L{Ai) C . . . C L(Aoo), the limit of the 
sequence is L(Aoo), which represents the least fixpoint of AX . I Upost{X,R). 

Corollary 1. Consider a transition system with one integer variable x that is 
initially set to 0 and is increased by a constant c at each step. Then the approx- 
imate fixpoint corresponds to the exact set of reachable states. 

Proof. Clearly all reachable states satisfy 3j/ > 0 . a; = c • y. If c is odd, the 
automaton A^o representing this constraint has states 0, 1, . . . , c — 1, where L{n) 
contains all non-negative integers for which the remainder of division with c 
is n. State 0 is the initial and only accepting state. Also (5(0, 0) = 0. Clearly 
L{n) n L{m) = 0 whenever n m, therefore A^o is state-disjoint. Moreover, the 
automaton Aq representing the initial state a; = 0 has only one accepting and 
initial state that loops 0 and sends 1 to sink. Obviously Ag is weakly equivalent 
to Aoo and hence the hypothesis of Theorem 2 holds. Consequently, the exact set 
of reachable states will be computed. If c is even, it can be written as c = 2" • (i, 
where d is odd. Then every number divisible by c consists of a prefix of n zeros 
and a suffix that is divisible by d. Following similar arguments as before we can 
conclude that the hypothesis of Theorem 2 holds, if the widening seed is 1. 

Note that the class of systems that satisfy the hypothesis of Theorem 2 is 
quite large. Corollary 1 is just an example of such a system. Figure 1 illustrates an 
example fixpoint computation for the system described in Corollary 1, when c = 
3. The first column shows the automata representing the approximate fixpoint 
iterate Si, the second column shows the automata representing S'' = SiUpost{Si) 
and the third column shows the equivalence classes on the states of S'. For each 
i, S,+1 = S,VS'. 
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Fig. 1. Example fixpoint computation for increment by 3 



module square ( ) 
integer y,z; 

parameterized integer x; 
initial: y=0 && z=x; 
square: z>0 && z’=z-l kk y’=y+x; 
spec: AG([z=0 => ((x<=3 => y<=9) kk 
(x=4 => y=16) kk (x>=5 => y>=25))]) 
endmodule 



module con_decr() 
integer x,y; 
initial: x>=0 kk y>=0; 
con_decr: y>0 => (x’=x-l kk y’=y-l); 
spec: AF([x=0]) 
endmodule 



Fig. 2. Example specifications of transition systems 



Our second example, shown in Figure 2 as module square, is a transition 
system that computes the square of an integer parameter x by iteratively adding 
a; to a variable y that is initially set to 0. Variable z is used to count the iterations. 
The set of reachable states of the system contains exactly those states where y = 
{x — z) ■ X and is obviously non-regular. Hence, no forward fixpoint computation 
is expected to converge since the fixpoint cannot be represented in a finite way. 
Also, the closure of the transition relation of the system is not regular and thus 
cannot be computed exactly. On the other hand, there are meaningful properties, 
like the one shown in Figure 2, which can be verified using a backward fixpoint 
computation. The exact computation does not converge. However our algorithm 
terminates after 10 iterations and indeed verifies the property. 

Finally, the third example illustrates the use of the collapse operator V, 
the dual of the widening operator V. The collapse operator should satisfy the 
condition that given two sets A and B, AO B ^ AVB. Given a widening oper- 
ator V, one can trivially define a collapse operator as: AVB = For 
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other representations for integer sets (e.g. Polyhedra [7], or composite disjunctive 
representations [17]), negation is an expensive operation and therefore this defi- 
nition is not efficient. However, for deterministic automata, negation has linear 
complexity in the number of states (it suffices to make accepting states rejecting 
and vice versa), thus the above definition can be implemented efficiently. 

The collapse operator is useful in disproving liveness properties and produc- 
ing counter examples for them, whenever the exact fixpoint computation does 
not converge. As an example, consider the system shown in Figure 2 as module 
con_decr. There are two non-negative variables x and y with arbitrary initial 
values. We keep decrementing them concurrently while y is positive. We want 
to check whether AF(x = 0) i.e., x will eventually become 0 for all execution 
paths. It is easy to see that this does not hold when x> y. The exact backward 
fixpoint computation for AF(a; = 0) does not converge. If we stop the computa- 
tion after a fixed number of iterations, we can obtain an under-approximation 
of AF(a: = 0) and observe that it does not include the set of initial states, thus 
the property might not hold. To prove that, we need to verify the negation of 
the initial property, namely EG(a; yf 0). Once again the exact fixpoint compu- 
tation does not converge. However, an approximate fixpoint computation that 
uses our collapse operator converges in 3 steps and indeed computes the exact 
set of states that violate AF(a; = 0), namely those states where x > y. Note 
that since the sequence of fixpoint iterates for EG is decreasing, we cannot use 
widening. 

5 Experiments 

Widening can be used for two purposes: 1) As we explained above, it can be 
used when the exact fixpoint computations do not converge; 2) It can be used to 
speed up fixpoint computations that would otherwise converge but only after a 
large number of iterations. To demonstrate this fact we repeated the experiments 
in [2] using widening as well. 

We integrated the construction algorithms in [2, 3] as well as the approximate 
fixpoint computation algorithms based on the widening technique presented ear- 
lier to an infinite state GTL model checker called Action Language Verifier (ALV) 
[8] built on top of the Gomposite Symbolic Library [17]. In our experiments we 
compare the running times of the exact and approximate forward and backward 
fixpoint algorithms. For the same experiments we also present the running time 
of BRAIN [15]. BRAIN is a reachability analysis tool, that 1) uses Hilbert’s basis 
as symbolic representation for integer sets, and 2) computes the exact backward 
fixpoint iterations. There is no approximation operation in BRAIN for the fix- 
point computations which do not converge. 

We experimented with a set of examples taken from the BRAIN distribu- 
tion available at: http://www.cs.mctn.ac.uk/~voronkov/BRAIN/ and the ALV 
distribution available at: http://www.cs.ucsb.edu/~bultan/composite/. We 
obtained the experimental results on a SUN ULTRA 10 work station with 768 
Mbytes of memory, running SunOs 5.7. The results are presented in Table 1. For 
the approximate fixpoint computations we also report the seed used for widen- 
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Table 1. Experimental results. Time measurements appear in seconds 



Problem Instance 


BRAIN 


exact 

forward 


exact 

backward 


ALV 

approximate 

forward 


seed 


approximate 

backward 


seed 


CSM4 


3.76 


oo 


99.35 


0.21 


0 


79.29 


5 


CSM6 


25.01 


oo 


540.88 


0.21 


0 


482.11 


7 


CSM8 


128.54 


oo 


1772.85 


0.21 


0 


3782.51 


9 


CSMIO 


494.03 


oo 


4809.13 


0.21 


0 


T 


11 


CSM12 


1644.33 


oo 


9676.81 


0.20 


0 


T 


13 


CbMinvlO 


0.93 


oo 


0.58 


0.18 


0 


0.61 


4 


CSMinv20 


3.57 


oo 


0.90 


0.18 


0 


0.79 


4 


CSMinv30 


9.59 


oo 


1.09 


0.19 


0 


1.01 


4 


CSMinv40 


20.71 


oo 


1.20 


0.19 


0 


1.08 


4 


CSMinv50 


38.58 


oo 


1.45 


0.19 


0 


1.36 


4 


bigjava 


11244.60 


oo 


T 


T 


0 


T 


0 


bigjavainv 


2641.05 


2.85 


82.33 


9.78 


0 


43.17 


10 


bigjavainvl 


30615.20 


8.57 


1160.09 


8.32 


0 


26.13 


1 


consistencyprot 


1.09 


oo 


24.28 


0.34 


0 


26.85 


7 


consistencyprot inv 


7.75 


1.16 


59.38 


1.14 


0 


11.55 


1 


consistencyprot invl 


0.05 


oo 


0.16 


0.20 


0 


0.17 


4 


consprod 


11346.40 


1.85 


T 


1.39 


0 


T 


4 


consprodinv 


1.27 


oo 


0.66 


139.17 


0 


0.54 


5 


tickct2 


★ 


0.13 


oo 


0.12 


0 


0.13 


0 


tickets 


★ 


0.45 


oo 


0.45 


0 


0.56 


0 


ticketd 


★ 


2.93 


oo 


2.89 


0 


6.77 


0 


coherence 


★ 


oo 


oo 


0.23 


0 


0.13 


0 


bakeryS 


0.35 


oo 


0.38 


7.95 


0 


0.44 


4 


bakeryd 


14.82 


oo 


9.83 


1681.85 


0 


10.09 


5 


bakeryS 


1107.75 


oo 


577.45 


T 


0 


582.43 


6 



ing. Entries of t mean that the computation was aborted because it did not finish 
in 5 hours or the memory limit was exceeded. Entries of oo mean that the exact 
fixpoint computation does not converge. Finally, * means that we are checking 
a liveness property that cannot be handled by BRAIN. Problem instances can 
be categorized in three groups: 

1. Pure integer problems (CSM, bigjava, consistencyprot and consprod) 

2. Integer problems with invariants (those with the suffix inv) 

3. Problems with both boolean and integer variables (ticket, coherence, bakery) 

The problems with invariants are obtained from the original problems by con- 
joining the transition relation with a set of invariants. A typical invariant has the 
form a;i a;fe < m, where m is a natural number. Such invariants essentially 

bound the variables Xi, ...,Xk to a finite region. 

We analyzed each of the problem instances with three different configura- 
tions of ALV using two exact fixpoint computation algorithms (forward and 
backward) and two approximate fixpoint computation algorithms based on our 
widening technique. For each problem we chose a widening seed that makes our 
approximation precise enough to allow us to verify the properties. For the for- 
ward case, the lowest possible value for the seed, 0, is adequate. However, for 
the backward case we had to set the seed higher to achieve the required preci- 
sion. In this sense our technique is not fully automatic. Nevertheless, one could 
automate the choice of widening seed by iteratively trying all possible values, 
starting from 0, until the property is verified. 
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For almost all problem instances, one of the configurations of ALV (depending 
on the choice of exact or approximate algorithm, forward or backward fixpoint, 
and the value of the widening seed) is faster than BRAIN. Note that according 
to [15] BRAIN outperforms other infinite state model checkers: Hytech, DMC 
and a version of ALV that uses a Polyhedral representation for arithmetic sets. 
The only two exceptions are consistencyprotinvl, for which the difference of 
performance is very small, and bigjava, for which ALV runs out of memory. 

Among all fixpoint computation algorithms used in ALV, the approximate 
forward algorithm is the faster for most of the problems with the approximate 
backward algorithm coming second. This indicates that the use of widening can 
speed up fixpoint computations significantly even when the exact computations 
converge. We believe that the fact that the forward algorithm is usually faster 
than the backward algorithm is due to the specifics of each problem instance 
and is not a general rule. A characteristic example is consprod and consprodinv. 
Another observation is that while the approximate backward algorithm performs 
well for the invariant versions of the problem instances, it usually performs poorly 
for the non-invariant versions. This does not happen with the forward algorithm. 
An explanation to this fact is that each of the forward fixpoint iterates natu- 
rally satisfies the invariants. However, this is not true for the backward fixpoint 
iterates, whose size is reduced when intersected with the invariants. 

The problem instances ticket2, tickets, ticketd and coherence could not be 
solved by any exact backward algorithm. For ALV, the exact backward fix- 
point computation diverges whereas the forward fixpoint computation converges. 
BRAIN cannot handle liveness properties as the ones specified in ticket and co- 
herence. On the other hand, both approximate algorithms were able to verify the 
properties. Furthermore, while the exact forward fixpoint computation diverges 
for most of the problem instances, the approximate one converges relatively fast 
for almost all of these problem instances. This shows that in practice our widen- 
ing technique can be successfully applied to non-trivial systems, whereas in [4] 
only very simple systems are considered. A problem of special interest is bakery. 
Our exact backward fixpoint computation always converges and scales better 
than BRAIN, whereas the exact forward fixpoint computation always diverges. 
Widening does not help much for this problem. The approximate forward fixpoint 
computation does not scale well. The approximate backward fixpoint computa- 
tion is precise enough only when widening is used in the last iteration and thus 
it takes a little longer than the exact computation to finish. 

Finally, we repeated all experiments using another version of ALV in which 
integer sets are symbolically represented as polyhedra and manipulated by the 
Omega Library [1] . This version uses an extension of Halbwachs’ widening al- 
gorithm [10] to Presburger arithmetic [7]. For the ticket, coherence and bakery 
problems, we could verify the properties using both forward and backward ap- 
proximate fixpoint computations but the running times are much higher than 
those for the automata version. For the CSM problem instances, the approxi- 
mate forward fixpoint computed was not precise enough to verify the properties 
immediately. Due to internal limitations of the Omega Library we could not get 
results for the rest of the problem instances. 




Widening Arithmetic Automata 333 



References 

1. The Omega project, http://www.cs.umd.edu/projects/omega/ 

2. Constantines Bartzis and Tevfik Bultan. Efficient image computation in infinite 
state model checking. In Proceedings of the 15th International Conference on Com- 
puter Aided Verification (CAV 2003), volume 2725 of Lecture Notes in Computer 
Science, pages 249-261. Springer, 2003. 

3. Constantines Bartzis and Tevfik Bultan. Efficient symbolic representations for 
arithmetic constraints in verification. International Journal of Foundations of 
Computer Science, 14(4):605-624, 2003. 

4. Bernard Boigelot, Axel Legay, and Pierre Wolper. Iterating transducers in the 
large. In Proceedings of the 15th International Conference on Computer Aided 
Verification , volume 2725 of LNCS, pages 223-235. Springer, 2003. 

5. Ahmed Bouajjani, Bengt Jonsson, Marcus Nilsson, and Tayssir Touili. Regular 
model checking. In Computer Aided Verification, pages 403-418, 2000. 

6. A. Boudet and H. Comon. Diophantine equations, Presbnrger arithmetic and finite 
automata. In Proceedings of the 21st International Colloquium on Trees in Algebra 
and Programming, volume 1059 of LNCS, pages 30-43. Springer- Verlag, April 1996. 

7. T. Bultan, R. Gerber, and W. Pugh. Model-checking concnrrent systems with 
unbounded integer variables: Symbolic representations, approximations, and ex- 
perimental results. ACM Transactions on Programming Languages and Systems, 
21(4):747-789, July 1999. 

8. T. Bultan and T. Yavuz-Kahveci. Action language verifier. In Proceedings of the 
16th IEEE International Conference on Automated Software Engineering, 2001. 

9. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static 
analysis of programs by construction or approximation of fixpoints. In Proceedings 
of the 4th Annual ACM Symposium on Principles of Programming Languages, 
pages 238-252, 1977. 

10. P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among 
variables of a program. In Proceedings of the 5th Annual ACM Symposium on 
Principles of Programming, pages 84-97, 1978. 

11. Denis Dams, Yassine Lakhnech, and Martin Steffen. Iterating transducers. In 
Computer Aided Verification’ 01, 2001. 

12. N. Halbwachs. Determination automatique de relations lineaires verifiees par les 
variables d’un programme. PhD thesis. University of Grenoble, March 1979. 

13. N. Halbwachs, Y. E. Proy, and P. Roumanoff. Verification of real-time systems 
using linear relation analysis. Formal Methods in System Design, 11 (2): 157-185, 
August 1997. 

14. Bengt Jonsson and Marcus Nilsson. Transitive closures of regular relations for 
verifying infinite-state systems. In Tools and Algorithms for Construction and 
Analysis of Systems, pages 220-234, 2000. 

15. Tatiana Rybina and Andrei Voronkov. Using canonical representations of solutions 
to speed up infinite-state model checking. In Proceedings of the 14 th International 
Conference on Computer Aided Verification, pages 400-411, 2002. 

16. P. Wolper and B. Boigelot. On the construction of automata from linear arithmetic 
constraints. In Proceedings of the 6th Conference on Tools and Algorithms for the 
Construction and Analysis of Systems, LNCS, pages 1-19. Springer, April 2000. 

17. T. Yavuz-Kahveci, M. Tuncer, and T. Bultan. Composite symbolic library. In 
Proceedings of the 7th International Conference on Tools and Algorithms for the 
Construction and Analysis of Systems, volume 2031 of Lecture Notes in Computer 
Science, pages 335-344. Springer- Verlag, April 2001. 




Why Model Checking 
Can Improve WCET Analysis 



Alexander Metzner 

Safety critical Embedded Systems group 
Department of Computer Science 
Carl von Ossietzky University Oldenburg 
metznerSinf ormatik . uni-oldenburg . de 



Abstract. Calculating predictions for an upper bound of the execution 
time of real-time tasks in embedded systems is a necessary step in de- 
signing such systems. There exist successful analysis methods, based on 
abstract interpretation and integer linear programming (ILP) for that 
problem. In [12] it is stated, that model checking is not adequate for this 
task. The approach presented in this paper shows that model checking is 
adequate and, furthermore, can improve the results. This is done by defin- 
ing an automaton based semantic for control flow graphs of programs for 
abstract and concrete instruction cache analysis. A binary search based 
bunch of model checker runs is used to calculate the upper bound of 
execution time. 



1 Introduction 

Schedulability analysis is one of the key factors in the design of a hard real-time 
system, that has to fulfill timing constraints stemming from the interaction with 
a physical environment. Particularly, in safety critical systems, as used in the do- 
mains of automotive, avionics or power-plant electronics, the guarantee to meet 
given deadlines is a stringent property of the system. To prove this property, 
common schedulability analysis methods [6] need an upper bound for the execu- 
tion time of programs, so-called worst case execution time (WCET). This bound 
has to be safe and accurate: it will never under-estimating the real behaviour. 
Furthermore it should be as close as possible to the real maximal execution time 
of the program. Deriving such execution time bounds from software is a challeng- 
ing task, since modern processor and cache architectures have to be taken into 
account. The dynamics of programs, ie. which paths through a program are valid 
or not, also influence the WCET, independent from the architecture. Therefore, 
usually the analysis of WCET is separated into two tasks, namely the low level 
analysis (dealing with architectural features) and the high-level analysis (dealing 
with program paths) [11]. In this paper, we focus on the first one, but give some 
hints how possibly to combine this with a high-level analysis. 

There exist several approaches to tackle the low-level analysis with respect 
to caches and pipelines. In general, the low level analysis consists of two sub- 
problems. Firstly, the timing behaviour of instructions of a program must be 
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predicted. Secondly, the longest path through the program must be calculated 
by using the predicted instruction timing. In [10] an abstract cache behaviour 
classification is defined, that statically predicts the worst case behaviour of each 
memory reference. [11] derives a similar classification of memory accesses using 
an abstract interpretation for cache behaviour prediction and integer linear pro- 
gramming for finding the longest path by a method called implicit path enumer- 
ation[7]. Only ILP is used in [8] for pipeline and simple cache analysis, whereas 
[3] uses data-fiow analysis methods for pipeline and cache analysis. 

Both tasks, the behaviour prediction and the longest path finding problem, 
can be solved by a model checker [9]. Since a model checker traverses the entire 
state space of a model and therefore returns a concrete path in the program, 
better results can be obtained than all mentioned approaches above, because of 
their abstract nature. However, [12] claims model checking to be an inadequate 
technique for this analysis task. The approach presented in this paper will show, 
that model checking can be used for WCET analysis and furthermore it could be 
a benefit to use it. In order to prove this, we have implemented an experimen- 
tal framework for an instruction cache analysis within the OFFIS verification 
environment [1] based on the VIS model checker [4]. 

2 Basic Modelling 

Calculations of the WCET for programs are performed on the control flow graph. 
In this section we define the construction of the control flow graph of a program 
from the program code. Further, we define an automaton based representation 
of the control flow graph that is used in the next chapter to build the semantics 
for WCET analysis. 

The starting point of a WCET analysis in a low level manner are executables, 
object code or assembler code. Our approach analyses timing behaviour of a 
program at assembler code level. For effectiveness reasons, the control flow graph 
is built from blocks of assembler instructions instead of single instructions. The 
idea is, to merge strongly connected instructions to so called basic blocks[10], 
that consists of the longest sequence of instructions with only one entry and one 
exit. 

Definition 1 (Basic Block). 

A basic block is constructed from a control flow of an assembler program by merg- 
ing sequential instructions to blocks of instructions. Only the first instruction of 
a block may be target of a jump and the last instruction must be a jump or the 
last instruction of the program. There are no other jumps within this block. 

Definition 2 (Basic Block Graph (BBG)). 

A basic block graph is a tuple B = {BB,Tb,L) with: 

— BB set of basic blocks of a program 

— Tb Q BB X BB transition relation according to the control flow 
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— £ C BB X BB X INq x IN’*" the labelling relation for loops (minimal and 
maximal iteration count) 

For each basic block bi € BB that is a head of a loop the following attributes are 
defined: 

— back jump : BB 2^^ : All source basic blocks that jump backwards to bi 

— exits : BB 2^^ : All basic blocks which are outside the loop headed by bi 
and which are targets of basic blocks inside the loop 

— cost : BB INq the number of processor cycles of an execution of each basic 
block in the worst case 

Note, that the assembler instructions of each basic block are considered in 
the cost attribute. The attribute backjump denotes all basic blocks that lead 
to a further iteration of loops, whereas exits denotes all blocks that finish the 
loop. The labelling relation must be defined by the user or a high level analysis 
in order to avoid unbounded loops and defines the possible interval of iterations 
for each loop. It has to be defined for each backjump basic block to the loop 
head. 

For simplicity reasons, we restrict our analysis to the prediction of instruction 
cache related timing, so an analysis of the pipeline behaviour with respect to 
timing is assumed to be given with the cost attribute. (Below, we will give a 
hint, how the integration of pipeline analysis is possible, too). 

The basic block graph can be statically derived from the assembler program 
and represents a control flow graph of the program. In order to calculate a 
safe upper bound of the execution time by using model checking techniques, 
we represent the basic block graph by a finite state machine, that contains all 
possible paths through the program. 

Definition 3 (Basic Block Automaton (BBA)). 

The Basic Block Automaton is a tuple BBA = (sq, S't, S^T, V, T) with: 

— S the set of states 

— So G S' the initial state 

— St Q S the set of termination states 

— T the set of inputs (used to resolve non- determinism) 

— V the set of variables (used for loop counters, cache contents and consumed 
time) 

— T C SU{T} X S the transition relation. The T element is used to define the 
init transition. 

For each transition t GT the following functions are defined: 

— guard : T — > BEXPBf'^^ a guarding condition of type boolean expression 

— action : T with ASSIGNy = {v -.= e \ v & V A e & 

EXPR^'^^} an action on each transition to conditionally update the sys- 
tem variables. EXPR is defined as usual, with the notation of c?ei : e 2 for 
conditional expressions. 
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We will define the semantics of a BBG in terms of a BBA. The main idea 
is to represent all paths through a BBG by an automaton that jumps each step 
from basic block to basic block. A sequence of states represents a path in the 
BBG. On each step the emerging costs of the successor state are added to a 
global variable, which contains the consumed time (from the init state to the 
actual state), ie. the runtime on the actual path. In the next chapter we will 
show for two different kinds of semantics, how this is done with respect to the 
cache architecture and the BBG. 

3 Two Different Semantics for BB Graphs 

The well known method from [11] to calculate safe and accurate WGETs uses 
a mixture of abstract interpretation (AI) for cache behaviour classification and 
integer linear programming to derive the worst case path through a BBG. To 
handle this problem by model checking, one can think of mainly two different 
approaches: First, model checking can replace the integer linear programming 
in order to solve only the subtask of finding the longest path. Secondly, it can 
replace both subtasks, the path finding as well as the prediction of cache be- 
haviour, to solve the entire problem. Ghanging only the method of critical path 
finding will not lead to more accurate WGETs with respect to ILP, but it will 
help to avoid ILP specific problems. In section 3.1 the semantic of this approach 
is described. 

On the other hand, employing the model checker for precise cache behaviour 
prediction can improve the results due to the avoidance of over-approximations 
of the abstraction methods of AI. We describe the semantics for precise cache 
behaviour prediction in section 3.2. 

3.1 Model Checking for Path Finding 

If the task for model checking is reduced to finding the longest path in a BBG, 
we have to assume a pre-calculated prediction of the behaviour of cache accesses. 
This is done by the prediction of the worst case hit/miss behaviour and is de- 
rived statically by an implementation of the abstract interpretation formalisation 
found eg. in [11]. The idea is to collect abstract states of the cache contents in 
order to compute an over-approximation of timing. In the next section, when we 
show why model checking can lead to more accurate results, this method will be 
explained in more detail. Here we are only interested in the classification for all 
memory accesses. Each memory access is predicted to behave like always-miss 
(am), always-hit (ah) or first-miss (fm). Memory accesses of type “am” will pro- 
duce a cache miss each time, “fm” means that only the first access leads to a 
cache miss, all further accesses will produces cache hits. Finally, “ah” indicates 
a cache hit for every access to this memory location. 

Since the approach presented in this paper is restricted to the behaviour 
of instruction caches, this analysis is performed for each instruction in a basic 
block bi- For efficiency reasons, the cache behaviour prediction of instructions 
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within one basic block are merged and the syntax of a BB is extended by the two 
mappings (all accesses that are not handled by these two mappings are treated 
as always hit): 

— get-am : BB —>■ INq number of always miss in a BB 

— get-fm : BB INq number of first miss in a BB 

Exploiting this knowledge we can abstract from the concrete cache architecture. 
We only have to consider the penalty of cache hits and misses (in processor 
cycles). For pipeline architectures we assume, that each instruction can be ex- 
ecuted within one cycle, including the hit penalty of the instruction memory 
access (pipelined RISC concept). 

The main idea of the semantic for path finding is to sum up the emerging 
costs (in processor cycles) of all basic blocks on a concrete path through the 
BBA in a variable cycles. This is performed at the transitions to each block. The 
addend is constant for the fixed costs of instruction execution in the pipeline and 
the cache accesses that are predicted as “am” and “ah” . In order to handle the 
category first-miss, we have to introduce one boolean variable fb^ for each basic 
block bi, if there exist “fm” accesses within this block. The content of the /;,• 
variable indicates, whether this is the first time of access or a subsequent one. 
Consequently, the cost that has to be added to cycles on entering bi contains 
the miss and the hit penalty, respectively. 

In order to resolve the non-determinism that occurs in each branch in the 
BBC (conditional jumps in the assembler program are the source of these non- 
determinism, because at this low level analysis we do not care about the condi- 
tions in the source program), an input ricond is required. Since an input, deter- 
mining the chosen path of a branch, is independent from all other branches in 
each step, one input ricond is sufficient for this task. 

Loop iteration bounds are used to avoid unbounded unrolling of loops. As 
always the case in real-time systems, software with divergent loops is not allowed 
and for the worst case analysis we need the interval of loop iterations for each 
loop. 

Therefore, for each loop a local 
loop counter lb- is inserted, that con- 
tains the actual number of loop unrolls 
for the current path. Figure 1 shows 
the transformation from a loop in the 
BBG to the BBA. For each transition 
to a loop head, that is not a backward 
jump from within the loop, the loop 
counter is initialised with 0. On each 

backward jump towards the loop head, the loop counter is incremented to indi- 
cate a new iteration. Exiting the loop is performed by taking a transition to a 
basic block outside the loop, which we called exit block. The path to an exit block 
always uses branching blocks^, thus the transformation has to follow the rule of 

^ Otherwise it would be no exit block, because if a block within a loop has only one 
successor, this target block must be within the loop, too. 




!((1<3 II n) && 1<5) 




Fig. 1. Semantic transformation of loops 
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introducing an input variable ricond- However, this alone would allow unbounded 
loop iterations, hence we have to take the iteration bounds into account. This 
is done by an extended guard at the transition of branching blocks, that leads 
to a loop exit {^{{hi < 'm-d V ricond) A < m„)). Exiting the loop by using the 
input variable ricond is only possible, if the number of iterations is greater than 
the minimal iteration count (m^) and is enforced if the maximal iteration count 
(to„) is exceeded (see figure 1)^. This leads to the following definition: 



Definition 4 (BBA semantics of a BBG). 

The semantics BBA = {sq, St, S,T,V,T) of a given basic block graph B = 
{BB,Tb, C) is defined by the following rules: 

Let Sfiead = {b \ b G S A 3bi G BB,3l = (6^,6, •,•) G £} be the set of all loop 
heads within B. 



- S = BB 

- So = b with 3bi G BB, tg € Tg : tg = ( 6 , bf) A G BB : $ts G '■ tg = 
(bj,b) 

- St = {b \ b G BB A G BB : G Tg : tg = ( 6 , bi)} 

- I = {Ucond} with Ucond G BOOL 

- V = {cycles} U [Itj \ bj G Stead] U {ft^ \ bj G BB : get_fm{bj) > 0} 

- T = Tg U [Unit] with Unit = (-L, So) 

With a transition t = ( 6 i, 62 ) GT it holds: 



ricond G T 

^ricond G T 

^{{Ibi < TOd V Ucond) 

A hi < mu) 

with Ibi GV A Ucond G X 



guard{t) = < 



{Ibi “A md V Ucond) 

A hi < mu 

with hi G V A Ucond G X 



{TRUE 



if3bi yf 62 G BB,3ti = ( 51 , 6^) G Eg 
A guard{U) = -•Ucond A 6* ^ Stead 
if3bi yf 62 G BB,3ti = {bl,bi) G Eg 
A guardfti) = Ucond Ah ^ Stead 



if3bi G Stead ■ b 2 G exits{bf) 
A 3A = (•, bi, md, mu) G C 



if 3bi G Stead-i3bj G BB, 
3tj G Eg, 3 A G C : 
bj yf &2 A bj G exits{bi) 
Atj = {bi,bj) 

A A = {■,bi,md,mu) 
else 



^ Using the first exit in case of bonnd exceeding is a structural property of assembler 
code for loops that is constructed by common compilers. 
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For each basic block bj let Cam(bj) = get_am(bj) x MISS-PENALTY be the cost 
of a cache miss of type “am” . Let Cfm = 9 Gt-fm{bj) x MISS-PENALTY the 
similar attribute for “fm” classified misses. Let actiorin (assignment for cache 
and pipeline costs) be defined as follows: 



actiounff) = < 



(cycles : — costl^so') + Cami^o') F 
U{/b, := FALSE \ V/fc, e V : b, ^ so} 
U{/,„ := TRUE} 

{cycles := cost{b2) + Camib2) 

+ (/h2)"^C/m(&2) : 0} 

U{/6, := TRUE} 



{{cycles := cycles + cost{b 2 ) + Cam(& 2 )} 



if t — tinit 



if get-fm{b 2 ) > 0 
f\ t ^ tijiit 
else 



To avoid unbound unrolling of loops, actioni is defined: 



{ {lb2 ■= 0 } ifb2 e Shead A &i ^ backjump{b2) 

{^2 := lb2 + 1 } ifh e Shead A &i e backjump{b2) 
0 else 



With this two parts, the action function can be constructed by 
actionft) = actionnft) U actionift) 

This BBA of a given BBG can now be translated to a model for a model checker. 
What we can do with model checking techniques, is the proof of Mbba h 
EF((/)Ar), where ((n is the state formula: 

4’n = { \J bt) A cycles > N 

t'.bt^ST 

The target of this proof is a path to one of the termination blocks whose cost (in 
processor cycles) is greater than N . Since searching for WCET is an optimisation 
problem with the objective function of max{cycles) , we must execute a bunch of 
model check runs to converge to the real WCET. This is performed in a binary 
search manner and we do an update on N each run: If Mbba |= EF((^at) fails, 
the value of N has to be decreased, otherwise it has to be increased^. If we have 
found a proof for Mbba |= EF(</)Ar._i) and the proof of Mbba \= EF((()ArJ fails, 
we have reached the maximum for the variable cycles and thus we have found, 
that the WCET equals Ni. 



3.2 Model Checking for Cache Behaviour Prediction 

As mentioned above, the former approach can compete with other techniques, 
like ILP based methods presented in [11]; it shows that with model checking 
techniques a low level analysis of WCET is possible as well. Furthermore, the 

® Note, that with very slightly changes the BCET can easily be analysed with the 
same technique. 
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ILP method is known to suffer from numerical instabilities that can produce non- 
valid solutions. In contrast, model checking based techniques by construction do 
not suffer from problems like this and always calculate valid solutions. 

To get some improvements on WCET analysis, we try to transfer the pre- 
diction of cache accesses from the statical way to the dynamic way under the 
control of a model check run. This increases complexity, but, however, we can 
avoid too pessimistic over-approximations of the static analysis. 

Static analysis, such as AI based 
methods, do not consider concrete 
paths through a program but work 
with abstract ones. The prerequisite 
for this abstraction is a safe over- 
approximation of the cache behaviour. 

Since model checking considers a con- 
crete path, at some points in a control 
flow graph the pessimistic prediction 
of abstract cache behaviour can be improved. These points are basic blocks, 
that are targets of more than one predecessor block. Regarding the abstract 
cache behaviour, the abstract analysis method is not able to determine, which 
predecessor is used (because this is only known in a concrete path). Therefore, 
at these points so-called jom-functions merge all possible paths in a safe way 
with respect to WCET. 

For example, in figure 2 the lower block loads a cache line, that is used in the 
successor block, but the upper block does not use this line. In order to give safe 
predictions, the join-function does some kind of set intersection of abstract cache 
states with the result, that in an “am” or “fm” analysis this cache accesses in 
both blocks leads to a cache miss for cache line b. This also holds, if the concrete 
path uses the block that pre-loads cache line b. 

To overcome this drawback, we give in this section a BBA based semantic 
for BBGs with dynamic cache analysis. We will reuse parts of the above defined 
semantic, namely most of the construction rules and the whole guard function. 
To model the cache behaviour of concrete paths, we introduce new variables 
to represent the contents of the lines of an instruction cache with given size, 
associativity and other parameters. 

In order to model dynamic cache 
contents on a concrete path, the whole 
cache memory must be taken into ac- 
count. Each access to instruction 
memory is deterministically mapped 
into one set of the cache and belongs 
to a so called “program line” [10]. Pro- 
gram lines are the set of instruction 
memory locations in a program that 
are mapped to the same cache set with 
the same tag (see figure 3). If we assume a critical word first loading strategy 



program lines 




Fig. 3. Abstract cache content representa- 
tion. 









line a x 










line b x 


line b x 













Fig. 2. Merging paths in an abstract cache 
behaviour analysis. Crosses mark cache 
miss prediction. 
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(modern processors usually implement this strategy), only the first access of in- 
structions within one program line to a cache set leads to a miss. Hence, we can 
abstract without loss of precision from collecting each memory location access 
by collecting only the program lines. While traversing a concrete path, we col- 
lect the program lines per cache line and set, that are loaded, and we delete the 
program lines that are replaced (depending on the replacement strategy) . This is 
performed in the sequence of memory accesses on the concrete path. Therefore, 
the definition of a basic block graph is extended by the set of program lines of 
each BB: 

pi : BB 2®^ denotes all program lines of a basic block 

set : 7V+ ^ IN+ gives the cache set that belongs to program line p 

Note, that we assume implicitly, that the size of the cache is sufficient to record 

all program lines of a basic block^. With this additional information we can now 

define the BBA-based semantic for a basic block graph with cache behaviour 

prediction. 



Definition 5 (BBA semantic of BBG with cache behaviour predic- 
tion). 

The semantics BBA = {sq, St, S,T,V,T) of a given basic block graph B = 
(BB,Tis,£) is defined by the following rules: 

— Let Sfiead, So, St,T,T be defined as in definition 4- 

— V = {cycles} U {Ibj | bj € Shead} U {ICg^h \ ICg^t G IN U {T} A g G 
{1, . . . , 5'i?T5'} A h G {!,..., A}} with SETS is the number of cache sets, 
A is the associativity of the cache and T denotes an invalid cache line 

— Let guard(t) and actionift) be defined as in definition 4 

Let the cost of memory accesses of a basic block b on a cache with associativity 
A be defined as 

Ccache{b)= Y. ( y {ICset(p),i=p)yO- MLS S. PENALTY 

p&pl(b) ^i={l,...,A} ^ 

and the cost of entering the first basic block by transition Unit 

bCinit — E MLSS.PENALTY 

pepi(so) 



With t = (61,62) G T, the assignments for cache and pipeline costs are defined: 

\ {cycles := cost{so) + Ctnit] if t = tinit 

acnonn(t) = < 

I {cycles := cycles + cost(b 2 ) + Ccache{b 2 )} else 

The assignments for dynamic cache prediction actiouc is defined with 

hit{p)= y {ICset(p),i=P) 

*={!,. ..A} 



This is the standard case in modern processors, since basic blocks are typically small. 



4 
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{ICp^a :=-L |pG {1, . . . , SETS}\{set{pl{so))} 

A ctG A}} U 

Upepi(^o) {{hit{p))^AGE{p) : LOAD{p)} if t = Umt 

Upepi(fc 2 ) {{hit(p))lAGE(p) : LOAD{p)} else 

With these three parts, the action function can he constructed by 
action{f) = actionn{f) U actionift) U actiondt) 



actiondt) = 



The concrete cache contents (ie. the program line actually stored in the cache 
lines) are represented in {SETS ■ A) IGg,a variables, that are updated on each 
memory reference. The cost of an access depends on the hit/miss state and is 
calculated by checking whether the program lines of the target block are within 
the cache variables or not. In the init transition a completely invalidated cache 
is assumed (worst case scenario). 

The real cache behaviour of loading lines is hidden by the AGE{p) and 
LOAD{p) macros, because they depend on the used cache architecture. For a 
two-way associative cache {A = 2) with the popular LRU strategy, AGE{p) and 
LOAD{p) are defined with s = set{p) (the :=: notation denotes swapping the 
contents): 

AGE{p) = {IGs ,2 = pVICs ,2 :=: ICs,i : < null > 

LOAD{p) = IGs, 2 ■■= ICsp ; /C,,i := p 

For higher degrees of the associativity this is a bit more complex, but the basic 
scheme is the same, even if we have more complicated and less predictable (in 
an abstract analysis) replacement schemes, like the pseudo-LRU used in the 
Embedded Pentium [5]. 

4 Evaluation and Discussion 

For an evaluation of the two described approaches, first, the design- flow and 
construction of the model from an assembler program is described. A very short 
sketch of our model checking environment is given, too. Using this framework, 
the result of an evaluation of three example case studies will be demonstrated 
and discussed. As mentioned above, we restrict the evaluation on the behaviour 
of an instruction cache and assume a “perfect” pipelined RISC architecture with 
a priori generated costs of one cycle per instruction. 

4.1 Design-Flow 

Starting point of the analysis is a C program with user annotations to bound 
the number of loop iterations. This is transfered to assembly language whereby 
the annotations are preserved. From this assembler program, we build the basic 
block graph and generate the BBA, which itself is a C program that represents 
the given semantic as a finite state machine. This is the base for the use of the 
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Fig. 4. The design-flow of WCET analysis using model checking techniques provided 
by the OFFIS verification environment. 



OFFIS verification environment [1], in which the C program is transformed [2] to 
an input language for the model checker core. As model checker engine we use 
the VIS [4]. Figure 4 gives an overview over our design-flow for WCET analysis. 

For the inquiry whether M \= EF(^jv)) we use the drive to property checking 
of the OFFIS VE (technically an invariance check) and produce a witness in 
case of a result of true. The witness contains the value of the cycles variable for 
the concrete path to one of the termination basic blocks. The result is used to 
increase or decrease the bound N, until we reach the termination point, where 
it holds Mbba 1= EF((/)Ari-i) A Mbba EF(0atJ. 

4.2 Experimental Results 

For evaluation of our approach we take three case studies as inputs for the above 
described design-flow to calculate the WCET. The case studies are derived from 
running projects and are located in the domain of embedded systems, partly 
implemented with typical CASE tools of an embedded systems development 
process (see table 1 for details). 



Table 1. Case studies with number of instructions, description and source of the code. 



case study 


instructions 


comment 


robot 


~ 250 


control of a LEGO mindstorm to find a light 
source and react on collisions, hand written C 
code 


collision 


~ 1100 


collision detection and avoidance for a satellite 
control unit, derived C code from Statemate 
specification 


flight 


~ 2500 


flight control unit to compensate side drift, gen- 
erated C code from SCADE 



As cache parameters we choose an instruction cache of 128 sets with 8 instruc- 
tions (4 bytes per instruction) per line and an associativity of two (altogether a 
cache size of 8KB; the size is typical for processors used in todays embedded sys- 
tems). The evaluation machine was a SUN BLADE 2000 (900 MHz UltraSparc 
III processor, 2 GB main memory) and the results are summarised in the table 
2 below. The times in time(A) and time(C) are the time to And the WCET, 
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including witness generation. As mentioned above, the results of WCET(A) are 
identical to the WCET the approach in [12] would calculate. Hence we can di- 
rectly compare [12] and the results from WCET(C) to highlight the improvement 
of our approach. The time of one single model check run ranges from 8 seconds 
for robot to 21 seconds for flight with abstract cache analysis and from 13 sec- 
onds for robot to 315 seconds for flight with concrete cache analysis. However, the 
results have shown, that WCET analysis with model checking techniques is pos- 
sible and can partly improve the results of the abstract analysis on case studies, 
that are typical in their size and complexity for tasks in embedded systems. 

To limit the number of iterations in the binary search procedure for the 
abstract cache analysis, we determine an interval, that was derived at the lower 
bound by a reachability analysis® and on the upper bound by a rule of thumb. 
For the concrete cache analysis we use the result of the abstract analysis and 
the number of joins to decrease the number of iterations (shown by iter. (A/C) 
in table 2). 

Table 2. Evaluation of the three benchmarks with results and runtime for the abstract 
and the concrete cache behaviour analysis and maximal iterations of the binary search. 
The letter A denotes the abstract, C the concrete cache analysis, bits denotes the state 
bits of the problem. 



case study 


WCET(A) 


bits (A) 


WCET(C) 


bits( C) 


time(A ) 


time(C) 


iter. (A/C) 


robot 


387 


35 


376 


75 


0:46 min 


0:36 min 


9/4 


collision 


502 


38 


478 


135 


2:00 min 


4:56 min 


10/5 


flight 


1782 


40 


1749 


362 


2:16 min 


15:24 min 


11/4 



4.3 Discussion and Perspectives 

As mentioned in the last chapter, the improvement of the concrete cache anal- 
ysis compared to abstract cache analysis is founded in the more accurate cache 
behaviour prediction at merging points in the control flow graph. Analysing the 
number of these points can give a good hint whether a concrete cache analysis 
could be profitable or not. In our case studies, we determine that 33% to 50% of 
the joins lead to additional cache miss predictions in the abstract cache analysis 
with respect to the concrete one. As a whole this results in an improvement of 
1.5% to 5% of the predicted WCET, if only instruction caches are taken into 
consideration. 

The investigation of concrete paths increases the complexity, as shown by 
bits in table 2. The used operations (addition and comparison with constants) 
are not expected to lead to state explosion problems. Thus the main factor for 
complexity are the ICa,a variables which depend on the cache architecture and 
the size of the program. Since we are dealing with tasks (parts of functions), 
program sizes greater than the flight examle are quite unusual. Furthermore, 

® The reachability analysis checks for the reachability of one termination block and 
retnrns a witness for the cycles variable within a few seconds on each of the models. 
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cache lines, that are not used, are sliced from the BBA during construction. 
This could also be possible for the associativity: For cache lines that are known 
to store only one program line there is no need for more than one cache line 
in the BBA, thus we can assume for this lines an associativity of one). This is 
currently not implemented but would decrease the number of state bits. 

Regarding pipeline analysis, there occurs the same problem at merging 
points, since without a concrete path the analysis method cannot determine 
which of the predecessor blocks is on the path. Hence there is an over-approxi- 
mation by taking the maximal delay, too. The consequences to the accuracy of 
the execution time computation are hard to predict, because this deeply depends 
on the used processor architecture. In well balanced pipelines there are no great 
differences in the delay of two predecessor blocks, but in pipelines with instruc- 
tions of various execution time intervals there can possibly be ten’s of cycles 
difference (for example using floating point execution units is very popular but 
yields exactly this problem). We conjecture, that with modern processor archi- 
tectures the improvement of a concrete analysis could reach the same magnitude 
as for the cache analysis. Note, that for the pipeline analysis we do not need a 
pipeline behaviour analysis under control of the model checker. A pre-analysis 
can provide path specific delays for each basic block that can be integrated in 
the BBA in a similar way as we do this for caches at the moment. Thus, the 
complexity of the problem should not grow dramatically. If we take the same for 
data caches into account, the use of model checking can reach an improvement 
of about 10% in average. 

More over, the described technique will allow for a more accurate path val- 
idation of WCET analysis by combining it with a high level analysis. In a high 
level analysis, for example, the mutual exclusiveness of parts of the paths can be 
proven. This can simply be exploited in our analysis by introducing new inputs 
rii at the branching blocks instead of the one input used now. By formulating 
an assumption over these inputs, the impossible paths through the control flow 
graph can be avoided and a more accurate WCET prediction can be achieved. 

5 Conclusion 

As a conclusion, our experimental environment for WCET analysis using model 
checking techniques and the evaluations with some typical embedded systems 
examples have shown, that this approach not only is able to produce valid exe- 
cution times, but also can improve the results of the analysis with respect to the 
well known abstract analysis. Furthermore, for the abstract cache analysis, our 
approach always produces correct runtime predictions, different from the ILP 
approach that sometimes suffer from numerical instabilities. 

Altogether we have shown, that model checking can be used quite well to 
solve the problem of determining execution time bounds. The application in a 
verification framework for the development of embedded systems opens a wide 
area of coupling the WCET analysis directly to the design in a CASE tool and 
promises further use cases, for example the combination with property checking 
for a high level path analysis, that can support the low level analysis, or the 
combination with automatic code generators. 
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Abstract. Regular model checking is a form of symbolic model check- 
ing for parameterized and infinite-state systems whose states can be rep- 
resented as words of arbitrary length over a finite alphabet, in which 
regular sets of words are used to represent sets of states. We present 
LTL(MSO), a combination of the logics MSO and LTL as a natural logic 
for expressing temporal properties to be verified in regular model check- 
ing. LTL(MSO) is a two-dimensional modal logic, where MSO is used for 
specifying properties of system states and transitions, and LTL is used 
for specifying temporal properties. In addition, the first-order quantifi- 
cation in MSO can be used to express properties parameterized on a 
position or process. We give a technique for model checking LTL(MSO), 
which is adapted from the automata-theoretic approach: a formula is 
translated to a (Biichi) transducer with a regular set of accepting states, 
and regular model checking techniques are used to search for models. We 
have implemented the technique and show its application to a number 
of parameterized algorithms from the literature. 



1 Introduction 

Regular model checking is a framework for algorithmic symbolic verification of 
parameterized and infinite-state systems [BLW03,KMM+01,WB98,BJNT00]. It 
considers systems whose states can be represented as finite words of arbitrary 
length over a finite alphabet, including array or ring-formed parameterized sys- 
tems with an arbitrary number of finite-state processes, and systems that oper- 
ate on queues, stacks, integers, and other linear unbounded data structures. In 
a system description, the set of initial states is represented as a regular set of 
strings, and the transition relation is given as a finite regular length-preserving 
transducer. Previous work on regular model checking [JN00,BJNT00,AJNd02] 
has developed methods for computing the set of reachable states of a system 
description, as well as the set of reachable loops, obtained from the transitive 
closure of the transition relation. In general, this problem is undecidable, but 
decidable results for certain classes have been obtained [JNOO]. 

* This work was supported in part by the European Commission (FET project AD- 
VANCE, contract No IST-1999-29082), and by the the Swedish Research Council 
(http:/ /www. vr.se/). 

R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 348—360, 2004. 

(c) Springer- Verlag Berlin Heidelberg 2004 
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The techniques for computing reachable states and reachable loops can in 
principle be used to verify both safety and liveness properties of parameterized 
system descriptions, but do not provide a convenient approach for checking ar- 
bitrary temporal logic properties of parameterized and infinite-state systems. 
Significant ingenuity is required in order to manually transform the verification 
of a temporal property of a parameterized system into a property of reachable 
states and transitive closures, in particular if the verification uses fairness prop- 
erties that are parameterized on system components [BJNT00,PS00]. It would 
be desirable to have a framework, analogous to the automata-theoretic approach 
in finite-state model checking [VW86], where the property of verifying a tempo- 
ral property is automatically transformed into a problem of checking emptiness 
for a Biichi automaton. 

In this paper, we address this problem by presenting an extension of the 
automata-theoretic approach [VW86] to the setting of regular model checking. 
We present a logic for expressing system models and temporal properties, which 
is a combination of the logics MSO over finite words and LTL. We use MSO for 
specifying sets of states and transition relations and LTL for specifying temporal 
constraints. The result is a two-dimensional modal logic, where MSO is used in 
the “space” (system state) dimension and LTL is used in the “time” dimension. 
Models of the logic are infinite sequences of (constant-length) words, representing 
computations of the specified system. We can then specify a verification problem 
as the conjunction of a system specification and a negation of the property to 
be verified. 

Following the automata-theoretic approach, we present an automated trans- 
lation from the translation from a formula tp in LTL(MSO) to a Biichi trans- 
ducer, consisting of a regular set / of initial states, a regular length-preserving 
transducer T, and a regular set F of accepting states. Accepting runs of the 
Biichi transducer are infinite sequences of words, where the first word is in I, 
consecutive words satisfy T, and infinitely many words are in F. Thus, tp is 
satisfiable if and only if the Biichi transducer has an accepting run. Since T is 
length-preserving, the existence of an accepting run can be checked by searching 
for a reachable loop which contains a state in F. 

A nice feature of our combination of MSO with LTL is that we get the power 
to express parameterized temporal properties for free: MSO offers variables to 
represent positions and quantify over them, which can be interleaved with tem- 
poral operators. As a concrete example, for a parameterized mutual exclusion 
algorithm, a typical property one would want to express is the following. 

If all processes satisfy a weak fairness requirement, then each process 

that is interested in entering its critical section will eventually do so. 

If the number of processes is fixed, the terms like “each process” can be re- 
placed by explicit conjunctions to obtain a standard model checking problem in 
propositional temporal logic. However, in regular model checking the number of 
processes is arbitrary. Fortunately, our logic gives quantification for free, so we 
can express this property directly, by a formula like 




350 



Parosh Aziz Abdulla et al. 



Vi.[n^(blocked(z) V progressing(z))] — > Vz.[n(trying(z) ^ Ocritical(z))] 

where i ranges over positions in the state, and each position represents a pro- 
cess. In this formula, we apply LTL operators (□ and O) to formulas with the 
MSO variable i, and later use MSO quantification over i to express parame- 
terized properties. In our logic LTL(MSO), temporal operators can be applied 
to formulas with at most one free first-order variable and no free second-order 
variables. This restriction allows to express parameterized temporal properties 
(e.g., fairness constraints) of individual processes in a parameterized system, as 
well as temporal properties of pairs of adjacent processes (in positions i and z-l- 1 
using one free variable i). 

A further nice property of adapting the automata-theoretic approach is that 
our transformation results in a uniform problem of checking for accepting runs, 
for which we can develop techniques that are more uniform than those presented 
in previous work [JN00,BJNT00,AJNd02]. We have extended our tool for regular 
model checking [AJNd03] to check whether Biichi transducers have accepting 
runs. This is done in two steps. First, the set of reachable states are computed 
as Inv = I o T*. Secondly, loops are found by identifying identical pairs in 
{F nT D Inv)oT*. This computation is more uniform and more efficient than the 
approach to verification of temporal logic properties outlined in [BJNTOO], which 
builds on computation of the transitive closure T+ of the transducer relation. 
We have verified safety properties with the tool for many of the examples in our 
previous work, as well as liveness properties for some of the examples. 

As special cases, when the formula contains no temporal operators, our 
method specializes into a decision procedure for MSO similar to that of MONA 
[HJJ"'"96], and when the formula contains no quantifiers our method specializes 
to ordinary LTL model checking. 

The remainder of the paper is structured as follows. In the next section, we 
present the logic LTL(MSO). Section 3 illustrates how it can be used to model 
and specify parameterized algorithms. The model checking technique, including 
the translation to Biichi transducer is presented and proven correct in Section 4. 
New techniques and evaluations of our extended implementation are presented 
in Section 5. 

Related Work. In addition to the work on regular model checking, cited earlier, 
there is a large body of research on the problem of model checking parame- 
terized systems of identical processes, in which there is no ordering between 
processes, and hence the system state can be represented as a multiset of pro- 
cess states (e.g., [BLS01,Del00,EK03,EK00,GS92]). This problem is substantially 
simpler, since ordering between processes need not be considered. Emerson and 
Namjoshi [EN95] give a technique for verifying a restricted class of parameter- 
ized token-passing algorithms by reducing an arbitrary ring to a small fixed-size 
ring under certain conditions. These restrictions are substantially stronger than 
in our framework. Sistla [Sis97] uses Biichi automata over two dimensional lan- 
guages (reminding of transducers) to specify network invariants when verifying 
systems by induction over their linear process structure. It is unclear what class 
of systems can be handled automatically by this technique. 
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The problem of checking liveness properties of array-shaped parameterized 
systems was considered by Pnueli and Shahar [PSOO] , who presented a technique 
for computing the transitive closure of a restricted class of transition relations. 
They also first manually employ abstractions to make the implementation ter- 
minate. 

Pnueli, Xu, and Zuck [PXZ02] present an interesting use of specialized ab- 
stractions in order to prove absence of starvation properties for Szymanski’s 
algorithm and the Bakery algorithm. The abstractions keep track of the number 
of processes with certain properties, and generate a finite-state system, which 
can be model-checked. The presented abstraction is specialized to prove non- 
starvation, and loses much information so that e.g., safety properties can no 
longer be checked. 

Our logic LTL(MSO) applied to words is related to existential monadic 
second-order logic (EMSO) on grids to define picture languages accepted by tiling 
systems (see e.g. [GR97]). Indeed, transducers over words can be considered as 
tiling systems where each transition represents a tile. Thus, it is expected that 
our logic LTL(MSO) is equivalent to EMSO on grids. However, the two logics 
come from different motivations. While EMSO on grids is used to reason about 
pictures, our logic is used to reason about parameterized structures over time. 
When applied to the word structure, the two logics coincide. 



2 The Logic LTL(MSO) 

We present the syntax and semantics of LTL(MSO). This logic combines the 
operators of MSO and LTL, under the restriction that temporal operators can 
be applied to formulas with at most one free first-order (MSO) variable and 
no free second-order variables. In this paper, we restrict the presentation to the 
temporal operators □ (always) and O (eventually), but this is not a fundamental 
restriction. 



Syntax. We denote first-order variables Varpo by lowercase letters i,j, k, . . . , 
second-order variables Varso by uppercase letters I,J,K,... and configuration 
variables V by lowercase letters x,y,z, . . . 






fG/|/CJ|i = j-|-l| x[i] I 
true I false 
ip If, { A , V, 

3iip I yi(fi I 3I(f I 

□ (/? I 0(fi 



x'\i] Atomic MSO formulas 
Boolean constants 
I —‘p Propositional connectives 
MSO Quantification 
Temporal operators 



No negation should have any temporal operator within its scope, which is why 
we provide dual versions of each operator. We further require that in every 
subformula of the form □(/? or <>(p, there is at most one free first-order variable 
(and no free second-order variables). More operators on first-order variables, 
such as <, <, =, can be expressed in MSO using standard encodings. 
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Semantics. Closed LTL(MSO) formulas are interpreted over matrices M over 
2^ of dimension oo x n where n S Z+ is a parameter that can take an arbi- 
trary positive integer value. We call the vertical (first) dimension time, and the 
horizontal (second) dimension space. Let Z„ = 1}. The element 

M{t, i) e 2^ for t e N and i £ Zn represents the system configuration at time t 
of position (or subsystem) i, which assigns truth values to the configuration vari- 
ables V. Thus, each row M{t,0) ■ ■ ■ M{t, n) represents the system configuration 
at time t. 

In general, a formula ip depends on its free first- and second-order variables, 
which are interpreted in the space dimension, and on a time point t. A model 
Ai for an arbitrary LT L{ MS O) -for mnla, ^ is a triple {M,I,t), where M : (N x 



Z„) ^ 2^ is a matrix over 2^, where X is a valuation X: Var^o and X: 

Vargo ^ 2^" of first-order and second-order variables, and where t is a time 


point t € N. Satisfaction of formulas is defined as follows. 


(M, X, t) ^ false 
(M, X, t) \= true 




i G I 


iff X(i) G X(/) 


ICJ 


iffX(J) C X(J) 


i= j + f 


iffX(z) =X(j) + l 


x[i] 


iff x G M{t,I{i)) 


x'[i\ 


iff x G M(t -I- 1, X(t)) 


ifl A (f2 


iff \= ipi and {M,I,t) \= tp 2 


ipi V (f2 


iff (M,X,t) ^ ipi or (M,X,t) |= tp 2 


ipi (f2 


iff (M,X, t) \= ip 1 implies {M,X,f) ^ ip 2 


ipi ^ (f2 


iff (M, X, t) 1= ipi if and only if (M, X, t) |= ip 2 


-nip 


iff (M,X,t) ^ip 


3iip 


iff 3m G Z„ {M,I[i ^ m],f) \= p 


'iiip 


iff Vm G Z„ [M,I[i ^ m],t) \= p 


3Iip 


iff dS'G 2^" {M,X[I^S],f) h V 


yiip 


iffVS'G 2^- \m,X[I^ S],t) h 


dip 


iff VF > t {M,X,t') \= p 


Oip 


iff 3t' > t (M,X, t ) \= p 



where T[i i-^- m] is the valuation which sends i to m and otherwise acts as X, 
and 1[1 1 -^ S'] is defined analogously. 



3 Modeling in LTL(MSO) 

In this section, we discuss how to model systems and set up a verification problem 
in LTL(MSO). 

A system model consists of a set of initial states, a transition relation, and 
possibly fairness properties. Let a state formula be a formula without temporal 
operators or primed configuration variables. Following the style of TLA [Lam91], 
the initial states are specified by a state formula tpi, the transition relation by a 
formula ipx over unprimed and primed configuration variables without temporal 
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operators, and fairness properties by a formula (pfair- A property is given as a 
formula p] for instance, an invariant property is of the form Opj„y for a state 
formula pinv To check whether the system model satisfies the property p, we 
search for models of the formula 



Pj a Upx A Pfair A ~^P . 

If is a safety property, the fairness properties pfair are not necessary, and can 
be omitted. 

There are several forms of fairness properties. Process fairness can be ex- 
pressed as 

</5/oir = VzDO(Al(t) V ^enabled(Al(i)) 

where A{i) represents all actions of process i, and enabled(A(z)) represents the 
set of states where the action A{i) can be taken, obtained using an existential 
quantification of the primed configuration variables in A{i ) . 

Example. To illustration modeling in LTL(MSO), we take the Bakery algorithm 
for mutual exclusion. This algorithm synchronizes entries into critical section 
for an arbitrary number of processes, using a mechanism with integer-numbered 
“tickets” . A process which wants to get into the critical section receives a ticket 
which is the maximum of all the outstanding tickets plus one. When a process 
has the lowest outstanding ticket, it enters the critical section and drops the 
ticket when leaving. Below is a pseudo-code description of the algorithm, in 
which the ticket of process p is represented by the variable ticketp, whose value 
is initially 0. 

Idle; ticketp := 1 -I- max ticketq 

Q 

Waiting: await \/q ^ p ■. {ticketp < ticketq V ticketq = 0) 

Critical: ticketp := 0 

Fig. 1. Bakery Algorithm 



To model the Bakery Algorithm in LTL{MSO), we change the perspective; 
rather than modeling the vector of process states, we let a configuration represent 
the states of the sequence of ticket numbers, using the configuration variable q. 
For each i, the value of q[i] is 

— T if there is no process that has ticket i + 1, and 

— W if some process with ticket t -I- 1 is waiting, and 

— C if some process with ticket i -|- 1 is in critical 

Note that we do not model tickets with number 0, since this is the ticket number 
of all “inactive” processes. For brevity, we implicitly use the invariant that each 
positive ticket number can be held by at most one process. This invariant can 
be verified separately, or not be assumed (e.g., by adding one more value of q[i] 
representing that several process have this ticket number) . 
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The initial state and transition relation of the Bakery Algorithm can then be 
specified by the following formulas. 



initial 
ticket (?) 

enter(?) 

exit(?) 

copy(i) 

actions(i) 

sys 



Vz q[i] =_L 

q[i] =_L A q'[i] = W A (Vj > i q[j] =_L) A 
(i ^ 0 ^ q[i - 1] yf_L) 
q[i\ = W A q'[i] = C A (Vj < i q[j] =_L) 
q[i] = C A q'[i] =_L 

q[i] = ?'[*] 

(ticket(i) V enter(z) V exit(z)) A (Vj yf i copy(j)) 
initial A n((3z actions(z)) V Vz copy(z)) 



Mutual exclusion can be specified by the formula 



mutex = D^(3z3j i ^ j A ^[z] = C A q[j] = C) 



In order to specify non-starvation, we add a fairness constraint for the actions 
enter (z) and exit(z). We add no fairness constraint for ticket (z), since the arrival 
of new processes should not be controlled by the algorithm itself. 

fairactions(z) = (enter(z) V exit(z)) A (Vj yf z copy(j)) 
fairness = Vz □O(fairactions(z) V ^enabled(fairactions(z))) 

non-starvation = Vz □ (^[z] = W^ <><?[*] = C) 

To check that the algorithm satisfies mutual exclusion and non-starvation, we 
should check that the formulas 



sys A ^ mutex 

sys A fairness A ^non-starvation 

do not have any models. The property that models are of arbitrary but fixed size 
implies that we actually verify the algorithm under the assumption that there 
is an arbitrarily chosen upper bound on ticket numbers. For safety properties, 
this is not a limitation since violations will be finite sequences of computation 
steps, but for fairness properties it can play a role. For the Bakery Algorithm, 
it can be seen that an arbitrary upper limit on ticket numbers does not affect 
non-starvation for waiting processes, but in general one must be aware of this 
modeling constraint. 



4 Translating LTL(MSO) to Biichi Transducers 

In this section, we describe how to transform a formula in LTL(MSO) into a 
Biichi transducer, consisting of a regular set / of initial states, a regular length- 
preserving transducer T, and a regular set F of accepting states. In this section, 
we will represent such a Biichi transducer by the LTL(MSO) formula (j)i A A 
nO(j>p, where 4>i and 4>f are MSO formulas denoting the regular sets of initial 
and accepting states, and 4>t is an MSO formula over primed and unprimed 
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configuration variables denoting the regular transition relation. The accepting 
runs of the Biichi transducer accept exactly the models of the temporal formula, 
so in the following we will consider (pj A Upx A OOpp to be a Biichi transducer 
where I, T, and F are regular transducers expressed in MSO. 

The idea of the construction is similar to that of the standard translation 
of propositional temporal logic to Biichi Automata [VW86]: the semantics of 
temporal operators is translated to additional state information in the Biichi 
automaton. In our case the operators are translated to new configuration vari- 
ables that represent the values of certain temporal subformulas of p. The seman- 
tics of these subformulas is represented by constraints on the values of the new 
configuration variables. 

We assume that p is in negative normal form. Define a core subformula of p 
as a subformula of p which has a temporal operator as its main connective. For 
each core subformula if of p, we introduce an auxiliary configuration variable x^. 
Intuitively, the value of at time point t and space position i should represent 
the value of ip at that entry in the matrix. For each core subformula Oxf where 
the main connective is O we introduce an auxiliary configuration variable yoi/> 
(called an eventuality variable). Intuitively, if the variable j/oi/) is true, then the 
formula tp should be true at some future time point. 

Using the auxiliary configuration variables, the value of any subformula tp 
can be represented by a formula in MSO over the extended set of configuration 
variables. Let us define this formula (l^p)) as follows: 



m 


= (p for (p in MSO 


{{ipi A 1P2)) 


= ((V'l)) A ((z/>2)) 


{{tpi V 1P2)) 


= ((V'l)) V {{1P2)) 


{(Blip)) 


= M^)) 


{{Viip)) 


= Vz((z/>)) 


{{ 3 hp)) 


= mi’)) 


iW)) 


= yim 


{{Dip)) 


— 


{{O1P)) 


— 



Let localconstr((/)) define the conjunction of a set of constraints on the auxiliary 
variables, called local constraints , as defined below. 

(a) For each auxiliary variable of form x.^, where i is the (possibly) free first- 
order variable in ip, define its constraint as follows. 

VL (xai^i [f] ^ ((V'i))M A [f] j when ip is □f/'i 

Vz. fxov-i [*] ^ ((V'l)) W V [z] j when ip is Oipi 

(b) Let z/ov>i 7 • ■ • 5 yoipk be the set of eventuality variables. We define their local 
constraint as: 

k 

A ' ((V’m))W) 

m—1 
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Furthermore, we require that all eventuality variables are false infinitely often 
and that they load new eventualities: let evconstr((/)) define the conjunction 
of a set of constraints on the auxiliary variables, called eventuality constraint, 
defined below. 



/\ Vz. A {y'o^Ji] ^ 

m—1 

The goal of this section is to prove that for each formula <j) there is a model 
M such that {M,2,t) |= ^ iff there is a model M' such that (M',2,t) \= {{4>)) A 
□ localconstr(0) A □Oevconstr(i^) where M' differs from M only on the 
auxiliary variables. By its form, the latter formula can be regarded as a Biichi 
transducer, and so we have translated (j) into an equivalent Biichi transducer. 
We first prove the following lemma. 

Lemma 1. If {M',I,f) ^ □localconstr(</>) A CO©vconst)r(^^j then fov nil 
auxiliary variables of (j) we have 

(a) 1= yi.xui,[i] — > 

(b) (M',J,t) \=yi.xo-^[i ] — > oiA)) V Oyo-^[i] 

(c) {M',X,t) 1= Vz.yov-W — ' ^((A) 

Proof, (a) Let |= Xq^[z] for some J' = I[i ^ m]. Since {M' ,I,f) \= 

□localconstr((/)), we have 

{M',l',t) h □ {xui,[i] ^ {{tpMi] Ax'o^H) 

It follows that {M',T',t') \= {{if))[i] for every t' > t. 

(b) Let {M',J',t) \= xoifi[i] for some J' = J[i m]. Since (M',I,t) |= 
□constr((/)) A □Oevconstr((/)), we have 

- {M',T,t) h □ (xoi^li] ^ {{'ip))[i] V 

- {M',T,t') 1= y'^,p[i] ^ x'^,p[i] for some t' > t 

If {M',l',t") ^ iii’i'i))) for some t” > t we are done, suppose that no such 
t” exists. Then it follows that |= and thus \= 

yoi,[i]- 

(c) Let {M',l',t) 1= yotpii] for some J' = I[i ^ m], and let t' > the the earliest 
point in time when 

{M',l',t') \= ^yotp[i] 

Then we have 

(M',I',t' - 1) \= yoil,[i] A ^2/ov-W 
and by the eventuality constraint of j/oi/; we get 

Since t' — 1 > t we conclude that 



hO((^W)) 
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Theorem 1. For each subformula ip of 4>, interpretation I and timepoint t we 
have there is a model M such that {M,2,f) \= ip ijf there is a model M' such 
that {M',2,f) \= {{ip)) A □localconstr(0) A □Oevconstr((^) where M differs 
from M' only in the auxiliary variables of (p. 

Proof. This case is rather straight-forward. Define M^{i) by letting 
be true at point M'^{t,i) iff {M,T,t) |= ip{i). Then the local constraints follow 
by the definitions of the temporal operators. The values of variables yoip{i) Etre 
then defined in a way that satisfies the eventuality constraint: 

— at time t, all variables y<ytp[i] are set to false; 

— if all yotp[i] are false at time t', then set the value of each yo-<p[i] at time t' + 1 
to the value of corresponding xc,ip[i] at time t' + 1] 

— if variable yoV'W is true at time t' , and {{ip{i))) holds, then set yoii>[i] to false 
at time t' -I- 1. 

<1=: Let {M',I,t) |= (fipj) A □localconstr((^) A DOevconstr ((/)). We prove 
by induction over the structure of ip. 

Ip G MSO : Since ((^/>)) = ip, we get {M',J,t) ^ ip. 

Ip = 3iipo : We get {M' , I, t) ^ 3i(Pypff) and by semantics {M' ,I[i ^ m],f) ^ 
((V’o)) for some m G Z. Since □localconstr((/)) A □Oevconstr((/)) is a closed 
formula and thus does not depend on i, it follows that {M',l[i m],t) |= 

□localconstr((/)) A □Oevconstr(0). By induction there is some M that 
differs from M' only in the auxiliary variables of ipo such that {M,X[i 
m],t) \=ipo. By semantics we get {M,I,f) |= ^iipo. 

Ip = {3IipQ,yiipo,yiipo} Similar to ip = 3iipo. 

Ip = dipQ : We get {M',J,t) ^ Xn^o- Lemma 1 it follows that {M',J,t) |= 
0{{iPq)). Then by semantics for all t' > t, we have {M',I,t') \= {{ipo)) and 
then by induction there is some M that differs from M' only in the auxiliary 
variables of ipo such that {M,I,t') ^ ipo. Since this holds for all t' > t we 
have by semantics that {M,I,t) |= □V’o- 
Ip = Oipo : We get {M',X,t) \= xotpo- Ly Lemma 1 it follows that {M',I,t) \= 
^((■0o)) V ^yoipo- We have {M',X,t') \= {{ipo)) for some f >t in both cases 
as follows. In the case (M',I,t) ^ ^yo^o holds, it follows by Lemma 1, 
and in the case {M',X,t) ^ ^((V'o)) holds it follows from semantics. Then 
by induction there is some M that differs from M' only in the auxiliary 
variables of ipo such that {M,I,t') |= ipo. Since t' > t we have by semantics 
that (M,X,t) \= Oipo. □ 

Corollary 1. For a formula (p, interpretation I and timepoint t we have that 
there is a model M such that {M,X,f) \= (p there is a model M' such that 
{M',X,t) 1= (((/))) A □localconstr((/)) A □Oevconstr((/)) where M differs from 
M' only in the auxiliary variables of (p. 

5 Implementation 

The transformation of the previous section results in a formula of the form 



/ A DT A aOF 
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where the formulas I, T, and F are in MSO and can be translated into trans- 
ducers. The transducers can be seen as binary relations on words that are closed 
under standard operations such as union, intersection and composition. For the 
sake of presentation, we will assume that I only contains non-primed configu- 
ration variables and look at / as a set instead of a binary relation. The general 
case does not introduce any significant additional problems. 

We find models of the formula I A DT A □OF’ as follows. 

(a) Compute the set of reachable states as Inv = I o T*. 

(b) Define as {(■u;, w') G T : w G Inv} and compute the set of loops by 
finding identical pairs in (T/„„ n F) oT* 

We use an optimized version of the above scheme as follows. When computing the 
invariant loT* the transition relation T contains local constraints for eventuality 
variables. For eventualities that hold infinitely often, such local constraints are 
only needed when checking for loops through F and not when computing the 
set of reachable states. To simplify the transition relation used when computing 
the set of reachable states, the formula is transformed into the following form: 

/ A GTo A 0(F A GT A □OF’) 

where To is like T but without the local constraints on eventuality variables for 
eventualities that hold infinitely often. The final procedure is then as follows. 

(a) Compute the set of reachable states Inv = I oTq. 

(b) As before, define T/„„ as {(w,m;') G T : w G Inv} and compute the set of 
loops by finding identical pairs in (T/„„ C F) oT*. 

To calculate expressions like T 10 T 2 *, we employ a technique for computing ar- 
bitrary relational compositions of transducers, expressed as regular expressions, 
by encoding the states of the automaton recognizing the regular expression in 
the first position of the word. As an example, the expression Ti 0 T 2 corresponds 
to an automaton with two states qi, and ( 72 - We use a transducer accepting the 
union of (gi, ( 72 ) 'Ti and (( 72 , ( 72 ) 'T 2 . Words in the transitive closure of the result- 
ing transducer, of the form (gi, < 72 ) • ^^[1]) ' ’ ’ then represent 

an element {w, w') in the relation Ti o T 2 . 

We have successfully verified safety properties for many of the examples in 
our previous work, as well as liveness properties for some of the smaller examples. 
Execution times are given in the table below. 



Algorithm 


Safety (ms) 


Liveness (ms) 


Tokenpass 


3930 


14790 


Tokenring 


4840 


29590 


Bakery 


4840 


23070 


Burns 


38560 




Dijkstra 


593220 




Szymanski 


1073200 
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We experience high execution times for checking safety properties for some 
of the algorithms. This is because we use the entire transition as opposed to 
computing the transitive closure of separate actions and using them for reach- 
ability analysis. This reachability analysis does not always terminate, however, 
since some algorithms require the transitive closure of combinations of actions for 
the analysis to terminate. Using the entire transition relation is a more general 
technique, but potentially more costly. 
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Abstract. The model checking of a counters system S often reduces to 
the effective computation of the set of predecessors Preg(A) of a set of 
integer vectors X. Because the exact computation of this set is not pos- 
sible in general, we are interested in characterizing the minimal Number 
Decision Diagrams (NDD) [WBOO] that represents the set Pre-*^(A). In 
particular, its size is proved to be just polynomially bounded in k when 
S' is a counters system with a finite monoid [FL02], explaining why there 
is no exponential blow up in k. 

1 Introduction 

Model checking infinite-state transition systems S often reduces to the effective 
computation of the potentially infinite set of predecessors Pre^. More precisely, 
the safety model checking can be expressed as the following problem: 

— Given as inputs an infinite-state transition system S and two possibly infinite 
sets Xq and X of respectively initial states and non-safe states, decide if 
Xq n Pre 5 (A) is empty. 

Infinite Sets. In order to effectively compute Pre 5 (A), one generally needs 
to find a class of infinite sets which has the following properties: closure under 
union, closure under Preg, membership and inclusion are decidable with a good 
complexity, and there exists a canonical representation. We are considering the 
Number Decision Diagrams (NDD) that provides an automata-based symbolic 
representation of some subsets of N'". 

Infinite- State Transition Systems. We will focus on systems S with m inte- 
ger variables and more precisely on counters systems with a finite monoid (also 
known as finite linear systems [FL02]), a class of systems that contains the re- 
set/transfer Petri Nets [DJS99], generalized broadcast protocols [EN98, DelOO], 
and all the counters automata. As this model is very general and powerful, the 
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price to pay is the undecidability of reachability properties and in particular the 
sequence (Pre^^(X))fc does not converge in general. 

Our Image Computation Problem. The characterization of the NDD struc- 
ture that represents Pre^^(Ai) in function of k is an important problem in order 
to effectively compute the exact limit Preg(Jf) or a “good” over-approximation: 

— When there exists an integer ko such that Pre 5 (X) = Pre^^”(X), the char- 
acterization can be useful in order to design an efficient algorithm that 
incrementally computes these sets. Recall that even if the convergence of 
(Pre|^(X))fe is not guaranteed by the theory, in practice we observe that 
often, this sequence converges [DelOO, BB03] and it often converges quickly 
[Bra, Bab]. Moreover, as soon as the set X is upward closed and S' is a Well 
Structured Transition System [FSOl], the convergence is insured. 

— When the sequence (Pre|^(X))fe diverges, the characterization can be useful 
in order to design NDD specialized acceleration operators that computes the 
exact limit Preg(X) [BLW03, FL02] or an over-approximation [BGP99]. 

Related Works. We use the approach called the regular model checking', for 
channel systems, Semi-Linear Regular Expressions [FPSOO] and Constrained 
Queue-content Decision Diagrams [BH99] have been proposed; for lossy chan- 
nel systems [ABJ98], the tools Lcs (in the more general tool TreX [ABSOl] 
[Tre] ) uses the downward-closed regular languages and the corresponding subset 
of Simple Regular Expressions for sets and it represents them by finite automata 
to compute Post*; for stack automata, regular expressions or finite automata are 
sufficient to represent Pre* and Post* [BEF+OOj; for Petri nets and parameterized 
rings, [F097] uses regular languages and Presburger arithmetics (and acceler- 
ation) for sets. For Transfer and Reset Petri nets [DFS98], the tool Babylon 
[Bab] utilizes the upward closed sets and represents them by Covering Sharing 
Trees [DRVOl], a variant of BDD; for counters automata, the tool Brain [Bra] 
uses linear sets and represent them by their linear bases and periods; Mona 
[Mon] [KMS02] and FMona [BFOO] use formula in WSIS to represent sets; the 
tool CSL-ALV [BCP97] [Alv] uses linear arithmetic constraints for sets and 
manipulates formula with the Omega solver and the automata library of Lash. 

For counters systems with a finite monoid, tools Fast [Fas], [FL02], [BFLP03] 
and Lash [Las] utilize semi-linear sets and represents them by NDD, moreover, 
these two tools are able to accelerate loops [Boi03] [FL02] . In [FL04] , the NDD 
Pres (A) is proved to be computable in polynomial time in function of the NDD 
X for a large class of systems S that contains all the counters system with a finite 
(or infinite) monoid. Moreover, the size of the NDD that represents Pre|^(A) is 
proved to be polynomial in k when S and X are “defined in the interval- logic” , 
a restrictive class compared to the sets that can be represented by NDD. 

Our Results. 

1. We prove that the asymptotic number of states of the minimal NDD that 
represents Pre^^(A) is polynomial in k for any counters systems S with a 
finite monoid and for any set X represented by a NDD. 
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2. We show that the structure of the minimal NDD that represents Pref is 
similar to a BDD. That provides a new way for implementing a NDD library 
using all the BDD techniques for speeding-up the computation like cache- 
computation, minimization in linear time, Strong canonical form, well-known 
in the field of BDD [Bry92] . 

Plan of the Paper. The structure of the minimal NDD that represents a set 
X is given in section 3. In the next one, the definition of counters systems with 
a finite monoid is recall. The structure of the minimal NDD that represents 
Pre^^(X) in function of k, is studied in the last section 5. 

2 Preliminaries 

The cardinal of a finite set X is written card(X). The set of rational numbers, 
non negative rational numbers, integers and positive integers are respectively 
written Q, Q+, Z and N. The set of vectors with m > 1 components in a set X 
is written X'^. The i-th component of a vector x G X"^ is written Xi G X; we 
have X = (xi, . . . , x^)- For any vector v, v' G Q™ and for any t G Q, we define 
t.v and v + v' in Q™ by {t.v)i = t.Vi and {v + v')i = Vi + v^. For any x G Q*”, we 
define ||a;||^ = maxi(|a;i|). 

The set of square matrices of size m in K C Q is written Mm(X). The 
element My G K is the i-th raw and j-th column of a matrix M G 
The identity matrix is written /„. The vector M.x G is naturally defined by 
{M.x)i = MijXj. The subset M~^Y C Q™ is defined by M~^Y = {a; G 

Q'"; M.x G P} for any M G M(Q) and X C Q™. For any M G Mm(Q), we 
define ||M||^ = maxjy (|My |). 

The set of words over a finite alphabet X is written X*. The concatenation 
of two words a and a' in X* is written a.a' . The empty word in X* is written e. 
The residue of a language £ C N"* by a word cr G If* is defined by 

a~^.L = {w G X*; a.w G H} 

A deterministic and complete automaton A is a tuple A = (Q, P, <^7 <Z07 F); Q 
is the finite set of states, X is the finite alphabet, S : Qx X ^ Q is the transition 
relation, Qo ^ Q is the set of initial states and F C Q is the set of final states. 
As usual, we extends 6 over Q x X* such that S{q,a.a') = S{S{q,a),a'). The 
language L{A) C X* accepted by a deterministic and complete automaton A is 
defined by F{A) = {cr G X*\ S{qo,a) G F}. 



3 Number Decision Diagrams 

Recall that there exist two natural ways in order to associate to a word cr a 
vector in N™ following that the first letter of a is considered as an “high bit” or 
a “low bit” . In this article, we consider the “low bit” representation (even if the 
other one, just seems to be symmetrical, results proved in the paper cannot be 
easily extended to the other one). 
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Let us consider an integer r > 2 called the basis of the decomposition and an 
integer m > 1 called the dimension of the represented vectors. A digit vector b is 
an element of the finite alphabet = {0, . . . ,r — 1}'". The vector p{a) G N™ 
associated to a word cr = 6i . . . of n > 1 digit vectors bi G Sr^ is defined by 
p{a) = X)r=i f"^~^.bi. We naturally define p(e) = (0, . . . , 0). 

Definition 1 ([WBOO], [BC96]). A Number Decision Diagram (NDD) A is 
a finite deterministic and complete automaton over the alphabet Erm such that: 

p-\pmA))) = n{A) 

The subset X = p{Z{A)) C N"* is called the set represented by the NDD A. 
Such a subset X is said NDD- definable. 

Remark 1. Thanks to the condition p~^{p{L{A))) = fi{A), the set N’”\A" is 
represented by the NDD A = {Q, i5, Q\F). Recall that from any deter- 

ministic and complete binary automaton A, we can efficiently computes a NDD 
A' such that p{L{A)) = p{L{A')) [KMS02, Ler03]. 

Remark 2. Any Presburger definable set (a set defined by a formula in the first 
order logic (N, -|-,<)) [BC96, WBOO], or any semi-linear set (a set equal to a 
finite union of sets of the form xq N.p where xq € N'" and P is a finite 

subset of N™) [GS66, Reu89] can be effectively represented by a NDD. Moreover, 
recall that a set is NDD-definable if and only if it is definable by a formula in 
the first order logic (N, -|-,<,W) where is the valuation function in base r 
defined hy y = Vr{x) if and only if y is the greatest power of r that divides x 
[BHMV94], 

In the remaining of this section, we characterize the minimal NDD that 
represents a subset X. 

The equality p{<J.a') = p{<j) rl'^Lp(CT') shows that the function ■ N"* ^ 
N™ defined by 7 o-(x) = p{a) plays an important role. We are going to 

prove that X is NDD-definable if and only if the following set Q{X) is finite: 

Q(X) = {7-i(X); 

Remark that for any digit vector b G Sr^ and for any q G Q{X), the set %^{q) 
remains in Q{X). Hence, if Q{X) is finite, we can easily associate to a set X a 
deterministic and complete automaton A{X). 

Definition 2. Let X C N™ be such that Q{X) = {y~^{X)] a G Efm} is finite. 
The deterministic and complete automaton A{X) is defined by: 

(A{X) = {Q{X),Erm,S,qo,F) 

I Hq,b) = %\q) 

]qo = X 

[F={qGQ{X); (0, . . . , 0) G (?} 



We are going to prove that A{X) is the unique minimal NDD that represents X. 
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'hem.m.a. 1. For any and we have a ^.p ^{X)=p 

Proof. We have w G a~^ .p~^{X) iff a.w G p~^{X) iff p{a.w) G X iff "ia{p{w)) G 

X iff pH G 7"H^) iff G n 

The following theorem is really important because it proves that the structure 
of the minimal NDD that represents a set X can be obtained just by studying 
the sets 

Theorem 1. A set X C N™ is NDD-definable if and only if Q{X) is finite. 
Moreover, in this case, A{X) is the unique minimal NDD that represents X . 

Proof. Assume that Q{X) is a finite set. We are going to show that A{X) is 
a NDD that represents X by proving C(A.(A)) = p~^{X). By definition of 
A{X), we have a G fi{A{X)) iff (0, . . . , 0) G 7 “^(A). Therefore a G C(A(A)) iff 
p(cr) = 7cr((0, ... ,0)) G X. Hence, we have proved that C(A(A)) = p~^\x). In 
particular p(C(A(A))) = X and p~^{p{L{A{X)))) = C(A.(A)). We have proved 
that A{X) is an NDD that represents X. 

Now, assume that X is NDD-definable and let us prove that Q{X) is fi- 
nite. There exists a NDD A such that X is represented by A. Let C be the 
regular language accepted by A. As A is an NDD that represents X, we have 
p~^{p{L)) = C and p{L) = X. We deduce C = p~^{X). As the minimal de- 
terministic and complete automaton that recognizes L is unique, there exists 
a unique minimal automaton that represents X. Recall that the set of states 
of this minimal automaton is given by a G Sfm}. From lemma 1, we 

deduce that Q{X) = {p(cr“^.C); a G A*m}. Therefore, Q{X) is finite and by 
uniqueness of the minimal automaton, A{X) is the unique minimal NDD that 
represents X. □ 

4 Counters Automata with Finite Monoid 

The class of counters automata with finite monoid finite (a.k.a finite linear sys- 
tems [FL02]) is a natural extension of some classes of models like Reset/Transfer 
Petri Nets [DJS99], counter automata or broadcast protocols [EN98, DelOO]. Re- 
call that this class is also used as the input model of the accelerated symbolic 
model checker Fast [BFLP03]. 

Let us first provide the definition of a counters system. 

Definition 3. A NDD-linear function f is a tuple f = (D,M,v) such that 
D C N™ is NDD-definahle, M G Mm(Z) and v G Z™. 

Without any ambiguity, we also denote by / the function f : D ^ N™ de- 
fined by f{x) = M.x + V for any x G D. The composition of two NDD-linear 
functions is naturally defined by (Di, Mi,vi) o (D 2 , M 2 , V 2 ) = (D 2 n Mf~^(Di — 
V2), Ml. M2, M1.V2 + Hi). 

Definition 4. A counters system S (a.k.a linear system [FL02]) is a tuple S = 
(X, fs) where X is a finite set of actions and fs = {fa, a, G X{ is a finite set 
of NDD-linear functions. 
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For any word a = bi ... bn of n > I actions bi G S, the NDD-linear function 
fa is defined as fa = fb^ ° • fbi ■ The NDD-linear function fc is defined by 

/e = (N™, /, (0, . . . , 0)). We denote by {Da, Ma, Va) the NDD-linear function fa. 
Like in [FL02], we define the monoid of S. 

Definition 5 ([FL02]). The monoid multiplicatively generated by the square 
matrices Ma is called the monoid of S and written Mg = {Ma~, u € S*}. 

Definition 6. A counters system S such that Ms is finite is called a counter 
system with a finite monoid (a.k.a finite linear system [FL02]). 

Remark 3. The class of counters systems with a finite monoid enjoys good prop- 
erties that allow to easily accelerate the computation of the reachability set 
[FL02, Boi03, Ler03]. 

Finally, let us recall the definition of the set of immediate predecessors. 

Definition 7. Let S be a counters system. The set Pres(W) of immediate pre- 
decessors of a set X is defined by Pres(-^) = Uaen faHX). 

5 Structure of the Minimal NDD yi(Pre^^(X)) 

In [FL04], we have proved that for any counters system S, we can effectively 
computes in polynomial time a NDD that represents Pres(W) from any NDD 
that represents X. This result provides an exponential time algorithm for com- 
puting the minimal NDD yi(Pre^^(X)) in function of k. In fact, assume that 
each step of the computation multiplies the number of states of the NDD just 
by 2, then after k steps, the number of states of the NDD is multiplied by 2^. 
However, in practice, such an exponential blow up does not appear. 

To explain this experimental result, we are going to study the structure of the 
minimal NDD yi(Pre^^(Al)) in function of fc > 0. We prove an unexpected result: 
the NDD has a “BDD-like” structure and its number of states is polynomial in k 
for any counter systems S with a finite monoid and for any set X NDD-definable. 
We first prove a technical lemma. 

Lemma 2. Let X C N™ be a NDD-definable set, M G M(Z) and a G Q"*". There 
exists a finite class Cx,M,a such that for any v G and for any w G S*m, we 
have: 

IkIL < ^ n M-\X - v)) G Cx,M,a 

Proof. Let v G and w G E*m such that ||u||^ < We have: 

n M~\X - u)) = N™ n n m~\x - v )- p{w)) 

= N-n M-\^^{X-v-M.p{w))) 
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From the equality X = 7wo(7wo(^))> deduce: 



7-'(N’-nM-i(X-u)) 

= U [n-CM-i [7-J(X) + 



p(wo) - V- M.p(w)\\ 

r\M ) ) 



Let B = {z & Z™; ||z||g^ < l + a + m. ||AF||^} and let us prove that for any wq G 
4-, if N'" n M-l (74(X) + Pi^o)-v-M.p{w) \^ ^ 0 p(n,o)-v-M.p[w) ^ ^ 

In fact, in this case, there exists xq G such that Xq + g 

M.N™ C Z'". Therefore p^'>^o)-'u-M.p{w) ^ ^ Moreover, we have the following 

inequality: 

p{w[j) — V — M.p{w) 

J,\w\ 

< 1 + a + TO. ||M||^ 



< 



(rl“l - 1) +a.rl“l +to. - 1) 

r\w\ 



Now, just remark that the following finite class Cx,M,a satisfies the lemma: 



Cx,M,a = ■{ [J N" 

iq,b)&F 



n Af-i(g + 6 ); F C Q{X) x B 



□ 



Theorem 2. Let S he a counters system with a finite monoid and X he a NDD- 
definahle set. There exists a finite class Cs,x of subsets of N™ such that for any 
w G Sfm and for any L C N'-’’' we have: 




Proof. Let a = max(M.a)e 3 vtsxi: l|fi^-fa|loo- We define the function g„ : Q™ ^ 
Q"" by g^{x) = M„.x + for any a; G Q"", ct G T*. 

Let cr = oi . . . a„ be a word of n > 1 actions in E and w G Sfm be a word 
of vector digits such that \a\ < The sequence of prefixes (cTi)o<i<n of a 
is defined by CTi = oi . . . Ui. The set I{M, a, a) = {i G n}; = 

(M, u)} where {M,a) G Mg x is useful to compute the set 7 “^(/“^(X)) as it 
is shown by the following equality: 

=7-1 n • • • n gfil_fiDafi) n g4w) 

= n n 74(N'"nM-ip,-u.j) 

(M,a)GMsx£' iGl{M,a,(r) 
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Let Cx,M,a and be some finite classes satisfying lemma 2 for any (M, a) G 

Ms X 27. From v^. = we deduce ||uo-illoo ^ o;-* ^ “VI < 

Therefore, we have proved that 7 “^(/“^(A')) is in the following finite class Cg x- 

Cs,x=\r\Y^X'-.X'G U Cx,m,c-,FC IJ Cd^^m,o. 

[veF MeMs (m,o)gMsxi: 

Now, let Cs,x be the set of all finite unions of elements in Cg From the equality 
fa\X)) = U,re£ V)). we deduce that VH^)) 

G Cs,x for any L C 77-’’'”'. □ 

We can deduce many interesting results from the previous theorem. The first 
unexpected one is about the asymptotic number of states of the minimal NDD 
7l(Pre^^(X)) in function of k. 

Corollary 1. Let S be a counters system with a finite monoid and X he a NDD- 
definable set. There exists a constant cs,x such that the number of states of the 
minimal NDD that represents Pre^^(Jf) is bounded + cs,x- 

Proof. Let Cs,x be a class of finite subsets of N’" satisfying theorem 2 and let 
Xk = Pre|^(7f). From Xk = we deduce that for any w G 77*m 

such that k < we have "iff,^{Xk) G Cs,x- From theorem 1 we deduce that the 
set of states Q{Xk) of the minimal NDD that represents Xk satisfies Q{Xk) C 
Cs.x U { 7 “^(Xfc); < k}. From card({7“^(7ffe); rl“l < k}) < k^, we deduce 

that the cardinal of Q{Xk) is bounded by k™ + card(Cs,jc)- □ 

The previous corollary proves an experimental result : the number of counters 
TO is an exponential limitation for the effective computation of 7l(Pre^^(7f)) for 
large value of k. However, it explains why there is no exponential blow up in k. 
Now, let us study precisely the structure of 7l(Pre|^(7f)). 

Definition 8. A state q of a NDD A is said acyclic if the number of paths 
qo ^ q is finite. 

Corollary 2. Let S he a counters system with a finite monoid and let X he 
a NDD-definable set. The number of non acyclic states of the minimal NDD 
7l(Pre^^(X)) is hounded independently ofk. 

Proof. Let Cs,x be a class of finite subsets of N’" satisfying theorem 2 and let 
Xk = Pre|^(7f). We are going to prove that for any non acyclic state q of A{Xk), 
we have q G Cs,x- As the number of paths qo ^ q is infinite, there exists a path 
qo q such that rl'’’! > k. In this case 'y~^{Xk) G Cs,x- From q = 7“^(7ffe), 
we deduce q G Cs,x- Therefore, the number of non acyclic states of A{Xk) is 
bounded by card(Cs,jc)- □ 

Remark 4- The cardinal of the set Cs,x is used in the two corollaries. From the 
proof of lemma 2 and the proof of theorem 2, this cardinal can be easily bounded 
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by an elementary function in the size of S, A.{X) and in the size of the monoid 
Mg. From [MS77], we deduce that Ms has a size elementary in the size of S 
when matrices Ma are in Mm(N)- Therefore, in this case, the cardinal of Cs,x 
is elementary in the size of S and A(X). When matrices Ma are in Mm(Z), the 
elementary size of the monoid is an open problem to the best of our knowledge. 

We can easily extend the previous corollary in order to show that the struc- 
ture of the minimal NDD that represents Pre^^(X) corresponds to a BDD 
[Bry92] “concatenated” with a NDD that does not depends on k. This final 
result shows a new way for implementing a NDD library using all the BDD tech- 
niques for speeding-up the computation like cache-computation, minimization 
in linear time and strong canonical form, well-known in the field of BDD. Our 
symbolic model-checker Fast [Fas], will be available with this new library as 
soon as possible. 
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Abstract. We propose abstract regular model checking as a new generic tech- 
nique for verification of parametric and infinite- state systems. The technique 
combines the two approaches of regular model checking and verification by ab- 
straction. We propose a general framework of the method as well as several con- 
crete ways of abstracting automata or transducers, which we use for modelling 
systems and encoding sets of their configurations as usual in regular model check- 
ing. The abstraction is based on collapsing states of automata (or transducers) and 
its precision is being incrementally adjusted by analysing spurious counterexam- 
ples. We illustrate the technique on verification of a wide range of systems includ- 
ing a novel application of automata-based techniques to an example of systems 
with dynamic linked data stmctures. 



1 Introduction 

Model checking is nowadays widely accepted as a powerful technique for the verifica- 
tion of finite-state systems. However, many real-life systems, especially software sys- 
tems, exhibit various aspects requiring one to reason about infinite-state models (data 
manipulation, dynamic creation of objects and threads, etc.). Several approaches ex- 
tending model checking to be able to deal with them have recently been proposed. One 
of them is regular model checking [22,30, 12] - a generic, automata-based approach 
allowing for a uniform modelling and verification of various kinds of infinite-state sys- 
tems such as pushdown systems, (lossy) FIFO-channel systems, systems with counters, 
parameterised and dynamic networks of processes, etc. 

In regular model checking, configurations of systems are encoded as words over a 
finite alphabet and transitions are modelled as finite state transducers mapping config- 
urations to configurations. Finite automata can then be naturally used to represent and 
manipulate (potentially infinite) sets of configurations, and reachability analysis can 
be performed by computing transitive closures of transducers [21, 15,3,9] or images 
of automata by iteration of transducers [12, 28] - depending on whether dealing with 
reachability relations or reachability sets is preferred. To facilitate termination of the 
computation, which is in general not guaranteed as the problem being solved is unde- 
cidable, various acceleration methods are usually used. 

* This work was supported in part by the EU (FET project ADVANCE 1ST- 1999-29082), the 
French ministry of research (ACI project Securite Informatique), and the Czech Grant Agency 
(projects GA CR 102/04/0780 and GA CR 102/03/D21 1). 

R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 372-386, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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A crucial problem to be faced in regular model checking is the state space explosion 
in automata (transducer) representations of the sets of configurations (or reachability 
relations) being examined. One of the sources of this problem is related to the nature 
of the current regular model checking techniques. Typically, these techniques try to 
calculate the exact reachability sets (or relations) independently of the property being 
verified. However, if would often be enough to only compute an overapproximation of 
the reachability set (or relation) sufficiently precise to verify the given property. Indeed, 
this is the way large (or infinite) state spaces are being successfully handled outside the 
domain of regular model checking using the so-called abstract-check-refine paradigm 
[20,26, 14, 16] implemented, e.g., in tools for software model checking like Slam [6], 
Magic [13], or Blast [19]. All these tools use the method of predicate abstraction, where 
a finite set of boolean predicates is used to abstract a concrete system C into an abstract 
one A by considering equivalent the configurations of C that satisfy the same predicates. 
If a property is verified in A, if is guaranteed to hold in C too. If a counterexample is 
found in A, one can check if it is also a counterexample for C. If not, this spurious 
counterexample can be used to refine the abstraction such that the new abstract system 
A' no longer admits the spurious counterexample. In this way, one can construct finer 
and finer abstractions until a sufficient precision is achieved and the property is verified, 
or a real counterexample is found. 

In this work, we propose a new approach to regular model checking based on the 
abstract-check-refine paradigm. Instead of precise acceleration techniques, we use ab- 
stract fixpoint computations in some finite domain of automata. The abstract fixpoint 
computations always terminate and provide overapproximations of the reachability sets 
(relations). To achieve this, we define techniques that systematically map any automa- 
ton M to an automaton M' from some finite domain such that M' recognises a superset 
of the language of M. For the case that the computed overapproximation is too coarse 
and a spurious counterexample is detected, we provide effective principles allowing the 
abstraction to be refined such that the new abstract computation does not encounter the 
same counterexample. 

We propose two techniques for abstracting automata. They take into account the 
structure of the automata and are based on collapsing their states according to some 
equivalence relation. The first one is inspired by predicate abstraction. However, notice 
that contrary to classical predicate abstraction, we associate predicates with states of 
automata representing sets of configurations rather than with the configurations them- 
selves. An abstraction is defined by a sef of regular predicate languages Lp. We consider 
a slate q of an automaton M to “satisfy” a predicate language Lp if the intersection of Lp 
with the language L{M, q) accepted from the state q is not empty. Then, two states are 
equivalent if they satisfy the same predicates. The second abstraction technique is then 
based on considering two automata states equivalent if their languages of words up to 
a certain fixed length are equal. For both of these two abstraction methods, we provide 
effective refinement techniques allowing us to discard spurious counterexamples. 

We also introduce several natural alternatives to the basic approaches based on back- 
ward and/or trace languages of states of automata. For them, it is not always possible to 
guarantee the exclusion of a spurious counterexample, but according to our experience, 
they still provide good practical results. 
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All of our techniques can be applied to dealing with reachability sets (obtained by 
iterating length-preserving or even general transducers) as well as length-preserving 
reachability relations. 

We have implemented the different abstraction and refinement schemas in a proto- 
type tool and tested them on a number of examples of various types of systems including 
parametric networks of processes, pushdown systems, counter automata, systems with 
queues, and - for the first time in the context of regular model checking - an example 
of dynamic linked data structures. The experiments show that our techniques are quite 
powerful in all the considered cases and that they are complementary - different tech- 
niques turn out to be the most successful in different scenarios. The results are very 
promising and compare favourably with other existing tools. 

Related Work. In addition to what is mentioned above, many other results have been 
obtained for symbolic model checking of various kinds of infinite state systems, such 
as pushdown systems [10, 17], systems with counters [5, 18] or queues [4, 2, 1 1]. These 
works do not consider abstraction. For parameterised networks of processes, several 
methods using abstractions have been proposed [7, 24]. Contrary to our approach, these 
methods do not provide the possibility of refinement of the abstraction. Moreover, they 
are specialised for parameterised networks whereas our technique is generic. 



2 Finite Automata and Transducers 

We first provide a simple example that we use to demonstrate the way systems are 
modelled in (abstract) regular model checking and later to illustrate the verification 
techniques we propose. Then, we formalise the basic notions of finite automata and 
transducers and briefly comment on their use in verification by regular model checking. 

As a running example, we consider the transducer T in Fig. 1 . It models a simple 
parameterised system implementing a token passing algorithm. The system consists of 
a parametric number of processes arranged in an array. Each process either has (T) or 
does not have (N) the token. Each process can pass the token to its third right neighbour. 
The transducer includes the identity relation too. In the initial configurations described 
by the automaton Init, the second process has the token, and the number of processes 
is divisible by three. We want to show that it is not possible to reach any configuration 
where the last process has the token. This set is described by the automaton Bad. 

Formally, a. finite automaton is a 5-tuple M — (Q,l,,6,qo,F) where 2 is a finite set 
of states, Z a finite alphabet, 5 C Q x Z x (2 a set of transitions, qo G Q an initial state 
and F C g a set of final states. The transition relation g x Z* x g of M is defined as 
the smallest relation satisfying: (l)yq G Q : q q, (2) if {q,a,q') G 5, then q q' and 
(3) if q ^ q’ and q’ q", then q q". Given a finite automaton M and an equivalence 
relation ~ on its states, M/ ~ denotes the quotient automaton defined in the usual way. 

The language recognised by M from a state g € g is defined by L{M, q) = {w\ 3q' G 
F : q ^ q'}. The language L{M) is equal to L{M,q(fi). A set L C Z* is regular iff there 
exists a hnite automaton M such that L = L{M). We also define the backward language 
L (M,q) = {w\ qa ^ q} and the forward/backward languages of words up to a cer- 
tain length: L-’'{M,q) = {w G L{M,q) \ |w| < n} and similarly L -"(M,q). We define 
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T/T T/T N 




Fig. 1. A transducer modelling a simple token passing, initial, bad and reachable conhgurations 



the forward/backward trace languages of states T{M,q) = {tv G 2* | 3w' G 2* : ww' G 
L{M, q) } and similarly T (M, q) . Finally, we define accordingly forward/backward trace 
languages T-^{M,q) and T -’'{M,q) of traces up to a certain length. 

Let Z be a bnite alphabet and Eg = ZU {e}. A finite transducer over Z is a 5-tuple 
T = (2,Zg X Ze, 5,^0,F) where 2 is a finite set of states, 5 C g x Zg x Zg x Q a set of 
transitions, q^G Qw initial state and F C g a set of final states. A finite transducer is 

called a length-preserving transducer if its transitions do not contain e. The transition 

8 £ 

relation g x Z* x Z* x g is defined as the smallest relation satisfying: (1) q q 
for every q G Q, (2) if {q,a,b,q') G 5, then q q' and (3) if q q' and q' q" , 

then q q" . Then, by abuse of notation, we identify a transducer T with the relation 
|(w, m) I 3q' G F : qo q'}. For a set L C Z* and a relation F C Z* x Z*, we denote 
R{L) the set {w G Z* | 3W G L : (w', w) G R}. Let id C Z* x Z* be the identity relation and 
o the composition of relations. We define recursively the relations = id, x'+* = x o x' 
and X* = U^qX*. Below, we suppose id C x meaning that x' C x'+* for all i > 0. 

The properties we want to check are primarily reachability properties. Given a sys- 
tem with a transition relation modelled as a transducer x, a regular set of initial con- 
figurations given by an automaton Init, and a set of “bad” configurations given by an 
automaton Bad, we want to check x*{L{Init)) C)L{Bad) = 0. We transform more com- 
plicated properties into reachability by composing the appropriate property automaton 
with the system being checked. In this way, even liveness properties may be handled if 
the transition relation is instrumented to allow for loop detection as briefly mentioned 
later. For our running example, x*{L{Init)) is shown in Fig. 1(d), and the property of 
interest clearly holds. Notice, however, that in general, x* (L{Init) ) is neither guaranteed 
to be regular nor computable. In the following, the verification task is to find a regular 
overapproximation L D x* (L{Init)) such that L n L{Bad) = 0. 

3 Abstract Regular Model Checking 

In this section, we describe the general approach of abstract regular model checking 
and propose a common framework for automata abstraction based on collapsing states 
of the automata. This framework is then instantiated in several concrete ways in the 
following two sections. We concentrate on the use of abstract regular model checking 




376 



Ahmed Bouajjani, Peter Habermehl, and Tomas Vojnar 



for dealing with reachability sets. However, the techniques we propose may be applied 
to dealing with reachability relations too - though in the context of length-preserving 
transducers only. (Indeed, length-preserving transducers over an alphabet Z can be seen 
as finite-state automata over Z x Z.) We illustrate the applicability of the method to 
dealing with reachability relations by one of the experiments presented in Section 6. 



3.1 The Basic Framework of Automata Abstraction 

Let Z be a finite alphabet and Mx the set of all finite automata over Z. By an automata 
abstraction function a, we understand a function that maps every automaton M over Z 
to an automaton a(M) whose language is an overapproximation of the one of M, i.e. 
for some Ax C Mx, a : Mx — > Ax such that VM G Mx : L{M) C L{a{M)). We call a 
finitary iff its range Ax is finite. 

Working conveniently on the level of automata, we introduce the abstract transition 
function To; for a transition relation expressed as a transducer x over Z and an automata 
abstraction function a as follows: For each automaton M G Mx, Xa(M) = a(x(M)) 
where x(M) is the minimal deterministic automaton of x(L(M)). Now, we may itera- 
tively compute the sequence (x[>,(M)),>o. Since we suppose id C x, it is clear that if a 
is finitary, there exists k>0 such that x^+* (M) = x^(M). The definition of a implies 
L(x^ (M) ) D X* (L(M) ) . This means that in a finite number of steps, we can compute an 
overapproximation of the reachability set x*(L(M)). 

3.2 Refining Automata Abstractions 

We call an automata abstraction function a! a refinement of a iff VM G Mx :L{a! (M) ) C 
L(a(M)). Moreover, we call a! a true refinement iff it yields a smaller overapproxima- 
tion in at least one case - formally, iff 3M G Mx : L(a'(M)) C L(a(M)). 

A need to refine a arises when a situation depicted in Fig. 2 happens. Suppose we 
are checking whether no configuration from the set described by some automaton Bad 
is reachable from some given set of initial configurations described by an automaton 
Mq. We suppose L{Mq) \3L{Bad) = 0 - otherwise the property being checked is broken 
already by the initial configurations. Let M“ = a(Mo) and for each i > 0, M, = x(Mji j ) 
and Mf = a{Mi) = Xa(Mji j ). There exist k and I (0<k <l) such that: (1) Vi : 0 < / < Z : 
L{Mi)C]L{Bad) = 0. (2) L{Mi)f]L{Bad) = L{Xi) f 0. (3) If we define A,- as the minimal 
deterministic automaton accepting X^*(L(A,+i)) nL(M“) for all i such that 0 < i < I, 
then yi : k < i < I : L{Xi) nL(Mj) f 0 and L{Xf) = 0 despite L{Xf) f 0. Next, 

we see that either A: = 0 or L(Z<-_i) = 0, and it is clear that we have encountered a 
spurious counterexample. 

Note that when no I can be found such that L{M[) n L{Bad) f 0, the computation 
eventually reaches a fixpoint, and the property is proved to hold. On the other hand, if 
L(Ao) nL(Mo) f- 0, we have proved that the property is broken. 

The spurious counterexample may be eliminated by refining a to o! such that for any 
automaton M whose language is disjoint with L{Xjf), the language of its a' -abstraction 
will not intersect L{Xt) either. Then, the same faulty reachability computation (i.e. the 
same sequence of M, and Mf) may not be repeated because we exclude the abstraction 
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Fig. 2. A spurious counterexample in an abstract regular fixpoint computation 



of Mt to M“. Moreover, the reachability of the bad conhgurations is in general excluded 
unless there is another reason for it than overapproximating by subsets of L{Xt). 

A slightly weaker way of eliminating the spurious counterexample consists in rehn- 
ing a to a! such that at least the language of the abstraction of does not intersect 
with L{Xf). In such a case, it is not excluded that some subset of L{Xf) will again be 
used for an overapproximation somewhere, but we still exclude a repetition of exactly 
the same faulty computation. The obtained rehnement can be coarser, which may lead 
to more rehnements and a slower computation. On the other hand, the computation 
may terminate sooner due to quickly jumping to the fixpoint and use less memory due 
to working with less structured sets of configurations of the systems being verified - 
the abstraction is prevented from becoming unnecessarily precise in this case. For the 
latter reason, as illustrated later, one may sometimes successfully use even some more 
heuristic approaches that guarantee that the spurious counterexample will only eventu- 
ally be excluded (i.e. after a certain number of refinements) or that do not guarantee the 
exclusion at all. 

An obvious danger of using a heuristic approach that does not guarantee an ex- 
clusion of spurious counterexamples is that the computation may easily start looping. 
Notice, however, that even when we refine automata abstractions such that spurious 
counterexamples are always excluded, and the computation does not loop, we do not 
guarantee that it will eventually stop - we may keep refining forever. Indeed, the verifi- 
cation problem we are solving is undecidable in general. 

3.3 Abstracting Automata by Collapsing Their States 

In the following two sections, we discuss several concrete automata abstraction func- 
tions. They are based on automata state equivalence schemas that define for each au- 
tomaton from Mj; an equivalence relation on its states. An automaton is then abstracted 
by collapsing all its states related by this equivalence. We suppose such an equivalence 
to reflect the fact that the future and/or history of the states to be collapsed is close 
enough, and the difference may be abstracted away. 

Formally, an automata state equivalence schema E assigns an automata state equiv- 
alence Qx Q to each finite automaton M = (Q,l,,6,qo,F) over Z. We define the 
automata abstraction function Ue based on E such that VM e Mx : OCe(M) = M/ 
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We call E finitary iff ttE is finitary. We refine ttE by refining E such that more states are 
distinguished in at least some automata. 

The automata state equivalence schemas presented below are then all based on one 
of the following two basic principles: (1) comparing states wrt. the intersections of 
their forward/backward languages with some predicate languages (represented by the 
appropriate predicate automata) and (2) comparing states wrt. their forward/backward 
behaviours up to a certain bounded length. 

4 Automata State Equivalences Based on Predicate Languages 

The two automata state equivalence schemas we introduce in this section - F<p based 
on forward languages of states and based on backward languages - are both defined 
wrt. a finite set of predicate automata IP. They compare two states of a given automaton 
according to the intersections of their forward/backward languages with the languages 
of the predicates. Below, we first introduce the basic principles of the schemas and then 
add some implementation and optimisation notes. 

4.1 The Fy Automata State Equivalence Schema 

The automata state equivalence schema F <p defines two states of a given automaton to be 
equivalent when their languages have a nonempty intersection with the same predicates 
of fP. Formally, for an automatonM = (Q,'L,6,qo,F), Fy defines the state equivalence 
as the equivalence such that ^q\,q 2 &Q'-q\ ‘?2 (VP G P : L{P) n L{M, q\)fi^ 

%^L{P)fM{M,q2)fi^%). 

Clearly, as P is finite and there is only a finite number of subsets of P representing 
the predicates with which a given state has a nonempty intersection, F^i is finitary. 

For our example from Fig. 1, if we take as P the automata of the languages of the 
states of Bad, we obtain the automaton in Fig. 3(a) as the abstraction of Init from Fig. 
1(c). This is because all states of Init except the final one become equivalent. Then, 
the intersection of %{<x{Init)) with the bad configurations - shown in Fig. 3(c) - is not 
empty, and we have to refine the abstraction. 

The F schema may be refined by adding new predicates into the current set of pred- 
icates P. In particular, we can extend P by automata corresponding to the languages of 
all the states in Xj^ from Fig. 2. Theorem 1 proved in the full paper shows that this 
prevents abstractions of languages disjoint with such as - but not only - 

from intersecting with L{Xt). Thus, as already mentioned, a repetition of the same faulty 
computation is excluded, and the set of bad configurations will not be reached unless 
there is another reason for this than overapproximating by subsets of L{Xjfi. 

Theorem 1. Let us have any two finite automata M = {QM,^,?>M,qo ,Fm) and X = 
,q^,Fx) and a finite set of predicate automata P such that^qx G Qx '■ 3P G 
P : L{X,qx) = L{P). Then, ifL{M)CiL{X) = 0, L(aFj,(M)) nL(A) = 0 too. 

In our example, we refine the abstraction by extending P with the automata rep- 
resenting the languages of the states of Xq from Fig. 3(d). Fig. 3(e) then indicates for 
each state q of Init, the predicates corresponding to the states of Bad and Xq whose 
languages have a non-empty intersection with the language of q. The first two states of 
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(a) a{Init) (b) %{a{Init)) 






N N 



(e) Init labelled with states of Xq and Bad (f) The final result 



Fig. 3. An example using abstraction based on predicate languages 



Init are equivalent and are collapsed to obtain the automaton from Fig. 3(f), which is a 
fixpoint showing that the property is verified. Notice that it is an overapproximation of 
the set of reachable configurations from Fig. 1(d). 

The price of refining Fj; by adding predicates for all the states in may seem 
prohibitive, but fortunately this is not the case in practice. As described in Sect. 4.3, we 
do not have to treat all the new predicates separately. We exploit the fact that they come 
from one original automaton and share large parts of their structure. In fact, we can work 
just with the original automaton and each of its states may be considered an initial state 
of some predicate. This way, adding the original automaton as the only predicate and 
adding predicates for all of its states becomes roughly equal. Moreover, the refinement 
may be weakened by taking into account just some states of A), as discussed in Sect. 4.3. 

4.2 The B 2 > Automata State Equivalence Schema 

The Bj) automata state equivalence schema is an alternative of Fj) based on backward 
languages of states rather than forward. For an automaton M= {Q,l,,d,qo,F), it defines 
the state equivalence as the equivalence such that yqi,qi & Q ■ qi qi<^ (VF G 
T : L{P) r\"L{M,qi)^9^ L{P) n "ijM, gz) ^ ®) ■ 

Clearly, B^j is finitary for the same reason as Fj;. It may also be refined by extending 
P by automata corresponding to the languages of all the states in from Fig. 2. As 
stated in Theorem 2, the effect is the same as for F^j. 
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Theorem 2. Let us have any two finite automata M = ,Fm) and X = 

{Qx,X,5x,qQ,Fx) and a finite set of predicate automata fP such thatMqx € Qx '■ 3P G 
d’:'L(X,qx) =L(F). Then, (/L(M) nL(X) = 0, L(aBj,(M)) nL(X) = 0 too. 

4.3 Implementing and Optimising Collapsing Based on 

The abstraction of an automaton M wrt. the automata state equivalence schema Fy may 
be implemented by first labelling states of M by the states of predicate automata in 
P with whose languages they have a non-empty intersection and then collapsing the 
states of M that are labelled by the initial states of the same predicates. (Provided the 
sets of states of the predicate automata are disjoint.) The labelling can be done in a 
way similar to constructing a backward synchronous product of M with the particular 
predicate automata: (1) VP G T Wq^ G Fp Wqf G Fm- qp is labelled by qp, and (2) 
VP G P yq^,q 2 G Qp Vqf ,qf G Qm'- if is labelled by q^, and there exists a G Z 
such that q^ and<?f q2, then q^ is labelled with q^. The abstraction of an 

automaton M wrt. the B<p schema may be implemented analogously. 

If the above construction is used, it is then clear that when refining ¥pMtp, we can 
just add Xk into P and modify the construction such that in the collapsing phase, we 
simply take into account all the labels by states of Xj^ and do not ignore the (anyway 
constructed) labels other than q^'‘ . 

Moreover, we can try to optimise the rehnement of by replacing in P 

by its important tail/head part defined wrt. M<. as the subautomaton of based on 
the states of Xk that appear in at least one of the labels of wrt. 
respectively. As stated in Theorem 3 (proved in the full paper), the effect of such a 
refinement corresponds to the weaker way of rehning automata abstraction functions 
described in Section 3.2. This is due to the strong link of the important tail/head part of 
Xk to Mk wrt. which it is computed. A repetition of the same faulty computation is then 
excluded, but the obtained abstraction is coarser, which may sometimes speed up the 
computation as we have already discussed. 

Theorems. Let M and X be any finite automata over 1, and Y = {QY,X,6Y,qQ,FY) the 
important tail/head part ofX wrt. Vp/Urp and M. If IP' is such that WqY G Qy 3P G IP' : 
L{Y,qY)/t{Y,qY)=L{P) andL{M)nL{X)=&, nL(Z) = 0. 

A further possible heuristic to optimise the rehnement of F^j/Brp is trying to hnd just 
one or two key states of the important tail/head part of Z*. such that if their languages 
are considered in addition to P, i(M“) will not intersect L{Xf). 

We close the section by noting that in the initial set of predicates IP of ¥pfRp, 
we may use, e.g., the automata describing the set of bad conhgurations and/or the set of 
initial conhgurations. Further, we may also use the domains or ranges of the transducers 
encoding the particular transitions in the systems being examined (whose union forms 
the one-step transition relation T which we iterate). The meaning of the latter predicates 
is similar to using guards or actions of transitions in predicate abstraction [8]. 
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(b) The final result 



Fig. 4. An example using abstraction based on languages of words up to length n (for n = 2) 



5 Automata State Equivalences Based 
on Finite-Length Languages 

We now present the possibility of defining automata state equivalence schemas based on 
comparing automata states wrt. a certain bounded part of their languages. It is a simple, 
yet (according to our practical experience) often quite efficient approach. As a basic 
representative of this kind of schemas, we first present a schema based on forward 
languages of words of a limited length. Then, we discuss its possible alternatives. 

The automata state equivalence schema defines two states of an automaton to 
be equal if their languages of words of length up to a certain bound n are identical. 
Formally, for an automatonM = (2, 21,5, < 70 , F), F^ defines the state equivalence as the 
equivalence such that 1,^2 &Q' qi = L-"{M,q 2 ). 

F^ is clearly It may be refined by incrementally increasing the bound n on 

the length of the words considered. This way, as we work with minimal deterministic 
automata, we may achieve the weaker type of refinement described in Section 3.2. Such 
an effect is achieved when n is increased to be equal or bigger than the number of states 
in M<; from Fig. 2 minus one. In a minimal deterministic automaton, this guarantees that 
all states are distinguishable wrt. and M<, will not be collapsed at all. 

In Fig. 4, we apply F^ to the example from Fig. 1 . We choose « = 2. In this case, the 
abstraction of the Init automaton is Init itself. Fig. 4(a) indicates the states of x{Init) that 
have the same languages of words up to size 2 and are therefore equivalent. Collapsing 
them yields the automaton shown in Fig. 4(b) (after determinisation and minimisation), 
which is a fixpoint. Notice that it is a different overapproximation of the set of reachable 
configurations than the one obtained using Fa>. If we choose « = I, we obtain a similar 
result, but we need one refinement step of the above described kind. 

Let us, however, note that according to our practical experience, the increment of 
n by \Qm \ — 1 may often be too big. Alternatively, one may use its fraction (e.g., one 
half), increase n by the number of states in A), (or its fraction), or increase n just by one. 
In such cases, an immediate exclusion of the faulty run is not guaranteed, but clearly, 
such a computation will be eventually excluded because n will sooner or later reach the 
necessary value. The impact of working with abstractions refined in a coarser way is 
then like in the case of using F^j/By. 

Regarding the initial value of n, one may use, e.g., the number of states in the 
automaton describing the set of initial configurations or the set of bad configurations, 
their fraction, or again just one. 

As a natural alternative to dealing with forward languages of words of a limited 
length, we may also think of backward languages of words of a limited length and for- 
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ward or backward languages of traces with a limited length. The automata equivalence 
schemas Fj, and Bj based on them can be formally defined analogously to F^. 

Clearly, all these schemas are finitary. Moreover, we can refine them in a similar 
way as F^. For FJ and Bj, however, no guarantee of excluding a spurious counterex- 
ample may be provided. Using Fj, e.g., we can never distinguish the last three states of 
the automaton in Fig. 4(b) - they all have the same trace languages. Thus, we cannot 
remember that the token cannot get to the last process. Nevertheless, despite this, our 
practical experience shows the schemas based on traces are quite successful in practice. 

6 Experiments 

We have implemented the ideas proposed in the paper in a prototype tool written in YAP 
Prolog using the FSA library [29]. To demonstrate that abstract regular model checking 
is applicable to verification of a broad variety of systems, we tried to apply the tool to a 
number of different verification tasks. 

6.1 The Types of Systems Verified 

Parameterised Networks of Processes. We considered several slightly idealised mu- 
tual exclusion algorithms for an arbitrary number of processes (namely the Bakery, 
Burns, Dijkstra, and Szymanski algorithms in versions similar to [23]). In most of these 
systems, particular processes are finite-state. We encode global conhgurations of such 
systems by words whose length corresponds to the number of participating processes, 
and each letter represents the local state of some process. In the case of the Bakery 
algorithm where each process contains an unbounded ticket value, this value is not rep- 
resented directly, but encoded in the ordering of the processes in the word. 

We verified the mutual exclusion property of the algorithms, and for the Bakery al- 
gorithm, we verified that some process will always eventually get to the critical section 
(communal liveness) as well as that each individual process will always eventually get 
there (individual liveness) under suitable fairness assumptions. For checking liveness, 
we manually composed the appropriate Biichi automata with the system being verified. 
Loop detection was allowed by working with pairs of configurations consisting of a 
remembered potential beginning of a loop (fixed at a certain point of time) and the cur- 
rent conhguration being further modified. Checking that a loop is closed then consisted 
in checking that a pair of the same configurations was reached. To encode the pairs of 
configurations using finite automata, we interleaved their corresponding letters. 

Push-Down Systems. We considered a simple system of recursive procedures - the 
plotter example from [17]. We verified a safety part of the original property of interest 
describing the correct order of plotter instructions to be issued. In this case, we use 
words to encode the contents of the stack. 

Systems with Queues. We experimented with a model of the Alternating Bit Proto- 
col (ABP) for which we checked correctness of the delivery order of the messages. A 
word encoding a configuration of the protocol contained two letters representing inter- 
nal states of the communicating processes. Moreover, it contained the contents of the 
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two lossy communication channels with a letter corresponding to each message. Let 
us note that in this case, as well as in the above and below case, general (non-length- 
preserving) transducers were used to encode transitions of the systems. 

Petri Nets, Systems with Counters. We examined a general Petri net with inhibitor 
arcs, which can be considered an example of a system with unbounded counters too. 
In particular, we modelled a ReadersAVriters system extended with a possibility of dy- 
namic creation/deletion of processes, for which we verified mutual exclusion between 
readers and writers and between multiple writers. We considered a correct version of the 
system as well as a faulty one, in which we omitted one of the Petri net arcs. Markings 
of places in the Petri net were encoded in unary and the particular values were put in 
parallel. (Using this encoding, a marking of a net with places p and q, two tokens in p, 
and four in q would be encoded as q\q\pq\pq.) In some other examples of systems with 
counters (such as the Bakery algorithm for two processes with unbounded counters), 
we also successfully used a binary encoding of the counters like in NDDs [30]. 

Dynamic Linked Data Structures. We considered verification of a procedure for re- 
versing lists shown in Fig. 5. As according to our knowledge, it is for the first time that 
regular model checking has been applied to such a task, let us now spend a bit more 
time with this experiment. 

When abstracting the memory manipulated by 
the procedure, we focus on the cases where in the 
first M memory cells (we take the biggest M possible) 1: x := 0; 
there are at most two linked lists linking consecutive 2: while (list next) { 
cells, the first list in a descending way and the sec- 3: y:= list next; 

ond one in an ascending way. We represent config- 4: list next := x; 

urations of the procedure as words over the follow- 5: x:=list; list:=y; 

ing alphabet: list items are represented by symbols i, 6: } 

left/right pointers by </>, pointer variables are rep- 7: list next := x; 
resented by their names (list is shortened to 1), and 

o is used to represent the memory outside the list. Reversing a linear list 

Moreover, we use symbols iv (resp. py) to denote that 

V points to i (resp. outside the list). We use | to separate the ascending and descending 
lists. Pointer variables pointing to null are not present in the configuration representa- 
tions. A typical abstraction of the memory may then look like i<i<ljil>lpx where 
the first list contains three items, the second one two, list points to the beginning of 
the second list, x points outside the two lists, and y points to null. Provided such an 
abstraction is used for the memory contents (prefixed with the current control line), it is 
not difficult to associate transducers to each command of the procedure. For example, 
the transducer corresponding to the command list next := x at line 4 transforms a 
typical configuration 4 / < ix | // > iy > / o to the configuration 5 [<ix<il\iy>[p 
(the successor of the item pointed to by / is not anymore the one pointed to by y, but 
the one pointed to by x). Then, the transducer x corresponding to the whole procedure 
is the union of the transducers of all the commands. 
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If the memory contents does not fit the above described form, we abstract it to a sin- 
gle word with the “don’t know” meaning. However, when we start from a configuration 
like 1 il >[> i_o or 1 [<[< U_o, the verification shows that such a situation does not 
happen. Via a symmetry argument exploiting the fact that the procedure never refers to 
concrete addresses, the results of the verification may then easily be generalised to lists 
with items stored at arbitrary memory locations. 

By computing an abstraction of the reachability set x*{Init), we checked that the 
procedure outputs a list. Moreover, by computing an overapproximation of the reacha- 
bility relation x* of the system, we checked that the output list is a reversion of the input 
one (modulo the fact that we consider a finite number of different items). To speed up 
the computation, the reachability relation was restricted to the initial configurations, i.e. 
to idjnit O X . 

6.2 A Summary of the Results 

Our method has been quite successful in all the described experiments. The best results 
(corresponding to a certain automata state equivalence used, its initialisation, and the 
way of refinement) for the particular examples obtained from our prototype tool were 
mostly under 1 sec. on an Intel Pentium 4 processor at 1.7 GHz. The only exceptions 
were checking individual liveness in the Bakery example where we needed about 9 
sec., the Readers/Writers example where we needed about 6 sec., and the example of 
reversing lists where on the level of working with the reachability relation, we needed 
about 22 sec. (A more detailed description of the results may be found in the full paper.) 
Taking into account that the tool used was an early prototype written in Prolog, the 
results are very encouraging. For example, on the same computer, the Uppsala Regular 
Model Checker [3] in version 0. 10 took from about 8 to 1 1 seconds when applied to the 
Burns, Szymanski, and Dijkstra examples over whose comparable encoding we needed 
in the most successful scenarios from 0.06 to 0.73 sec. 

The results we obtained from our experiments also showed that apart from cases 
where the approaches based on languages of words/traces up to a bounded length and 
the ones based on intersections with predicate languages are roughly equal, there are 
really cases where either the former or the latter approach is clearly faster. The latter 
approach is faster, e.g., in the Dijkstra and ReadersAVriters examples whereas the for- 
mer, e.g., in the cases of reversing lists or checking the individual liveness in the Bakery 
example. This experimentally justifies our interest in both of the techniques. 

Let us note that for some of the classes of systems we considered, there exist various 
special purpose verification approaches, and the appropriate verification problems are 
sometimes even decidable (as, e.g., for push-down systems [10, 17, 27] or lossy channel 
systems [4, 2, 1]). However, we wanted to show that our approach is generic and can be 
uniformly applied to all these systems. Moreover, in the future, with a new version of 
our tool, we would like to compare the performance of abstract regular model checking 
with the specialised approaches on large systems. We believe that while we can hardly 
outperform these algorithms in general, in some cases of systems with complex state 
spaces, our approach could turn out to be quite competitive due to not working with the 
exact representation of the state spaces, but their potentially much simpler approxima- 
tions in which many details not Important for the property being checked are ignored. 
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7 Conclusions 

We have proposed a new technique for verification of parameterised and infinite- state 
systems. The technique called abstract regular model checking combines the approach 
of regular model checking with the abstract-check-refine paradigm. We have described 
the general framework of the method as well as several concrete strategies for abstract- 
ing automata or transducers, which are used in the framework for representing reacha- 
bility sets and relations, by collapsing their states. As we have illustrated on a number 
of experiments we did, including a novel application of automata-based techniques to 
verification of systems with dynamic linked structures, the technique is very broadly 
applicable. Moreover, compared to the other techniques of regular model checking, it is 
also quite efficient due to working with approximations of reachability sets or relations 
with precision found iteratively by eliminating spurious counterexamples and sufficient 
just to verify the property of interest. 

In the future, we plan to implement the techniques we proposed in the paper in a 
more efficient way and test them on larger examples. An interesting theoretical ques- 
tion is then whether some guarantees of termination can be obtained at least for some 
classes of systems. Further, the way of dealing with liveness properties within abstract 
regular model checking can be put on a more systematic basis allowing for a better 
automation. Similarly, we would like to extend our results on verifying systems with 
linear (linearisable) dynamic linked data structures and see how automata-based ver- 
ification techniques compare with static analysis techniques like shape analysis [25]. 
Finally, we believe that there is a wide space for generalising the method to working 
with non-regular reachability sets and/or systems with non-linear (tree-like or in general 
graph-like) structure of states. The latter generalisation is primarily motivated by our in- 
terest in verifying multithreaded programs with recursive procedures and/or programs 
with dynamically allocated memory structures that cannot be easily linearised. 



Acknowledgement 

We would like to thank Andreas Podelski for fruitful discussions. 



References 

1. R Abdulla, A. Annichini, and A. Bouajjani. Symbolic Verification of Lossy Channel Sys- 
tems: Application to the Bounded Retransmission Protocol. In Proc. ofTACAS, volume 1579 
of LNCS. Springer, 1999. 

2. P. Abdulla, A. Bouajjani, and B. Jonsson. On-the-fly Analysis of Systems with Unbounded, 
Lossy Fifo Channels. In Proc. of CAWS, volume 1427 of LNCS. Springer, 1998. 

3. PA. Ahdulla, J. d’Orso, B. Jonsson, and M. Nilsson. Algorithmic improvements in regular 
model checking. In Proc. of CAV’03, volume 2725 of LNCS. Springer, 2003. 

4. B. Boigelot and P. Godefroid. Symbolic verification of communication protocols with infinite 
state spaces using QDDs. In Proc. ofCAV’96, volume 1 102 of LNCS. Springer, 1996. 

5. B. Boigelot and P. Wolper. Symbolic verification with periodic sets. In Proc. of CAV’94, 
volume 818 of LNCS, pages 55-67. Springer, 1994. 

6. T. Ball and S. K. Rajamani. The SLAM toolkit. In Proc. CAV’Ol, LNCS. Springer, 2001. 




386 



Ahmed Bouajjani, Peter Habermehl, and Tomas Vojnar 



7. K. Baukus, S. Bensalem, Y. Lakhnech, and K. Stahl. Abstracting WSIS systems to verify 
parameterized networks. In Proc. ofTACAS, volume 1785 of LNCS. Springer, 2000. 

8. S. Bensalem, Y. Lakhnech, and S. Owre. Computing abstractions of infinite state systems 
compositionally and automatically. In Proc. of CAV’98, LNCS. Springer, 1998. 

9. B. Boigelot, A. Legay, and P. Wolper. Iterating transducers in the large. In Proc. CAV'03, 
volume 2725 of LNCS. Springer, 2003. 

10. A. Bouajjani, J. Esparza, and O. Maler. Reachability analysis of pushdown automata: Appli- 
cation to model-checking. In Proc. of CONCUR’97, LNCS. Springer, 1997. 

11. A. Bouajjani and P. Habermehl. Symbolic Reachability Analysis of Fifo-Channel Systems 
with Nonregular Sets of Configurations. Theoretical Computer Science, 221(1-2), 1999. 

12. A. Bouajjani, B. Jonsson, M. Nilsson, and T. Touili. Regular model checking. In Proc. of 
CAV’OO, volume 1855 of LNCS. Springer, 2000. 

13. S. Chaki, E. Clarke, A. Groce, J. Ouaknine, O. Strichman, and K. Yorav. Efficient verification 
of sequential and concurrent c programs. Formal Methods in System Design, 2004. To appear. 

14. E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction 
refinement. In Proc. CAV’OO, volume 1855 of LNCS. Springer, 2000. 

15. D. Dams, Y. Lakhnech, and M. Steffen. Iterating transducers. In Proc. CAV’Ol, volume 2102 
of LNCS. Springer, 2001. 

16. S. Das and D.L. Dill. Counter-example based predicate discovery in predicate abstraction. 
In Formal Methods in Computer-Aided Design, 2002. 

17. J. Esparza, D. Hansel, P. Rossmanith, and S. Schwoon. Efficient algorithms for model check- 
ing pushdown systems. In Proc. of CAV’OO, volume 1855 of LNCS. Springer, 2000. 

18. A. Finkel and J. Leroux. How to compose presburger-accelerations: Applications to broad- 
cast protocols. In Proc. of FST&TCS’02, volume 2556 of LNCS. Springer, 2002. 

19. T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Software verification with Blast. In 
Proc. of 10th SPIN Workshop, volume 2648 of LNCS. Springer, 2003. 

20. T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In Proc. ofPOPL’02. 
ACM Press, 2002. 

21 . B. Jonsson and M. Nilsson. Transitive closures of regular relations for verifying infinite-state 
systems. In Proc. ofTACAS’OO, volume 1785 of LNCS. Springer, 2000. 

22. Y. Kesten, O. Maler, M. Marcus, A. Pnueli, and E. Shahar. Symbolic model checking with 
rich assertional languages. Theoretical Computer Science, 256(1-2), 2001. 

23. M. Nilsson. Regular Model Checking. Licentiate Thesis, Uppsala University, Sweden, 2000. 

24. A. Pnueli, J. Xu, and L. Zuck. Liveness with (0,1, infinity )-counter abstraction. In Proc. of 
CAV’02, volume 2404 of LNCS. Springer, 2002. 

25. S. Sagiv, TW. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. TOPLAS, 
24(3), 2002. 

26. H. Saidi. Model checking guided abstraction and analysis. In Proc. ofSAS’OO, volume 1824 
of LNCS. Springer, 2000. 

27. Stefan Schwoon. Model-Checking Pushdown Systems. PhD thesis, Technische Universitat 
Miinchen, 2002. 

28. T. Touili. Widening techniques for regular model checking. ENTCS, 50, 2001. 

29. G. van Noord. FSA6.2, 2004. http://odur.let.rug.nl/~vannoord/Fsa/ 

30. P. Wolper and B. Boigelot. Verifying systems with infinite but regular state spaces. In Proc. 
of CAV’98, volume 1427, 1998. 




Global Model-Checking of Infinite-State Systems 



Nir Piterman^’* and Moshe Y. Vardi^’** 

^ Weizmann Institute of Science, Department of Computer Science, Rehovot 76100, Israel 
nir . pitermanSweizmann .ac.il 
http ; / /www . wisdom . weizmann .ac.il/~nirp 
^ Rice University, Department of Computer Science, Houston, TX 77251-1892, USA 
vardi@cs . rice . edu 
http ; / /www . cs . rice . edu/~vardi 



Abstract. In this paper we extend the automata-theoretic framework for rea- 
soning about infinite-state sequential systems to handle also the global model- 
checking problem. Our framework is based on the observation that states of 
such systems, which carry a finite but unbounded amount of information, can 
be viewed as nodes in an infinite tree, and transitions between states can be simu- 
lated by finite-state automata. Checking that the system satisfies a temporal prop- 
erty can then be done by a two-way automaton that navigates through the tree. 
The framework is known for local model checking. For branching time proper- 
ties, the framework uses two-way alternating automata. For linear time properties, 
the framework uses two-way path automata. In order to solve the global model- 
checking problem we show that for both types of automata, given a regular tree, 
we can construct a nondeterministic word automaton that accepts all the nodes in 
the tree from which an accepting run of the automaton can start. 



1 Introduction 

An important research topic over the past decade has been the application of model 
checking to infinite-state systems. A major thrust of research in this area is the applica- 
tion of model checking to infinite-state sequential systems. These are systems in which 
a state carries a finite, but unbounded, amount of information, e.g., a pushdown store. 
The origin of this thrust is the important result by Muller and Schupp that the monadic 
second-order theory of context-free graphs is decidable [MS85]. As the complexity 
involved in that decidability result is nonelementary, researchers sought decidability re- 
sults of elementary complexity. This started with Burkart and Steffen, who developed 
an exponential-time algorithm for model-checking formulas in the alternation-free p- 
calculus with respect to context-free graphs [BS92]. Researchers then went on to extend 
this result to the p-calculus, on one hand, and to more general graphs on the other 
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hand, such as pushdown graphs [BS95,Wal96], regular graphs [BQ96], and prefix- 
recognizable graphs [Cau96]. One of the most powerful results so far is an exponential- 
time algorithm by Burkart for model checking formulas of the /x-calculus with respect 
to prefix-recognizable graphs [Bur97]^ Some of this theory has also been reduced to 
practice. Pushdown model-checkers such as Mops [CW02], Moped [ESOl], and Bebop 
[BROO] (to name a few) have been developed. Successful applications of these model- 
checkers to the verification of software are reported, for example, in [CW02]. 

We usually distinguish between local and global model-checking. In the first set- 
ting we are given a specific state of the system and determine whether it satisfies a given 
property. In the second setting we compute (a finite representation) of the set of states 
that satisfy a given property. For many years global model-checking algorithms were 
the standard; in particular, CTL model checkers [CES86], and symbolic model-checkers 
[BCM+92] perform global model-checking. While local model checking holds the 
promise of reduced computational complexity [SW91] and is more natural for ex- 
plicit LTL model-checking [CVWY92], global model-checking is especially important 
where the model-checking is only part of the verification process, in [CKKVOl] global 
model-checking is used to supply coverage information, which informs us what parts 
of the design under verification are relevant to the specified properties. In [LBBOOl] an 
infinite-state system is abstracted into a finite-state system. Global model-checking is 
performed over the finite-state system and the result is then used to compute invariants 
for the infinite-state system. In [PRZOl] results of global model-checking over small 
instances of a parameterized system are generalized to invariants for every value of the 
system’s parameter. 

An automata-theoretic framework for reasoning about infinite- state sequential sys- 
tems was developed in [KV00,KPV02]. The automata-theoretic approach uses the the- 
ory of automata as a unifying paradigm for system specification, verification, and syn- 
thesis [EJ91,Kur94,VW94]. Automata enable the separation of the logical and the algo- 
rithmic aspects of reasoning about systems, yielding clean and asymptotically optimal 
algorithms. Traditionally automata-theoretic techniques provide algorithms only for lo- 
cal model-checking [CVWY92,KV00,KPV02]. As model-checking in the automata- 
theoretic approach is reduced to the emptiness of an automaton, it seems that this limita- 
tion to local model checking is inherent to the approach. For finite-state systems we can 
reduce global model-checking to local model-checking by iterating over all the states 
of the system, which is essentially what happens in symbolic model checking of LTL 
[BCM+92]. For infinite-state systems, however, such a reduction cannot be applied. In 
this paper we remove this limitation of automata-theoretic techniques. We show that the 
automata-theoretic approach to infinite-state sequential systems generalizes nicely to 
global model-checking. Thus, all the advantages of using automata-theoretic methods, 
e.g., the ability to handle regular labeling and regular fairness constraints, the ability 
to handle ^-calculus with backward modalities, and the ability to check realizability 
[KV00,ATM03], apply also to the more general problem of global model checking. 

We use two-way tree alternating automata to reason about properties of infinite-state 
sequential systems. The idea is based on the observation that states of such systems can 



* The monadic second-order theory and /r-calculus model-checking over high-order pushdown 
graphs is also decidable [KNU03,Cac03]. The complexity of both problems is nonelementary. 
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be viewed as nodes in an infinite tree, and transitions between states can be simulated 
by finite-state automata. Checking that the system satisfies a temporal property can then 
be done by a two-way alternating automaton. Local model checking is then reduced to 
emptiness or membership problems for two-way tree automata. 

In this work, we give a solution to the global model-checking problem. The set 
of configurations of a prefix-recognizable system satisfying a /i-calculus property can 
be infinite, but it is regular, so it is finitely represented. We show how to construct a 
nondeterministic word automaton that accepts all the configurations of the system that 
satisfy (resp., do not satisfy) a branching-time (resp., linear-time) property. In order 
to do that, we study the global membership problem for two-way alternating parity 
tree automata and two-way path automata. Given a regular tree, the global member- 
ship problem is to find the set of states of the automaton and locations on the tree from 
which the automaton accepts the tree. We show that in both cases the question is not 
harder than the simple membership problem (is the tree accepted from the root and 
the initial state). Our result matches the upper bounds for global model checking estab- 
lished in [EHRS00,EKS01,Cac02]. Our contribution is in showing how this can be done 
uniformly in an automata-theoretic framework rather than via an eclectic collection of 
techniques. 

2 Preliminaries 

Labeled Rewrite Systems. A labeled transition graph is G = {S, S, L, p, sq), where 
i7 is a finite set of labels. S' is a (possibly infinite) set of states, L : S ^ S is a. 
labeling function, p C S x S is a transition relation, and sq G Sq is an initial state. 
When p{s, s'), we say that s' is a successor of s, and s is a predecessor of s'. Eor a 
state s G S, we denote by G® = {E, S, L, p, s), the graph G with s as its initial state. 
An s-computation is an infinite sequence of states sq, si, . . . G S“ such that sq = s 
and for all t > 0, we have p{si, Si+i). An s-computation sq, si, . . . induces the s-trace 
L{so) ■ L{si) • • • G Let % C be the set of all s-traces. 

A rewrite system is R = {E, V, Q, L, T), where i7 is a finite set of labels, 1^ is a 
finite alphabet, Q is a finite set of states, L ■. Q x V* ^ E is a labeling function that 
depends only on the first letter of x (Thus, we may write L ■. Q xV iJ {e} ^ E. Note 
that the label is defined also for the case that x is the empty word e). The finite set of 
rewrite rules T is defined below. The set of configurations of the system is Q x V*. 
Intuitively, the system has finitely many control states and an unbounded store. Thus, in 
a configuration {q, x) G Q x V* we refer to q as the control state and to x as the store. 
We consider here two types of rewrite systems. In a pushdown system, each rewrite rule 
is {q, A,x,q') G QxV xV* x Q. Thus, T <G QxV xV* xQ.lna prefix-recognizable 
system, each rewrite rule is {q,a, (3,j,q') G Q x reg{V) x reg{V) x reg{V) x Q, 
where reg{V) is the set of regular expressions over V. Thus, T C Q x reg{V) x 
reg{V) X reg{V) x Q. Eor a word w G V* and a regular expression r G reg{V) we 
write w G r to denote that w is in the language of the regular expression r. We note that 
the standard definition of prefix-recognizable systems does not include control states. 
Indeed, a prefix-recognizable system without states can simulate a prefix-recognizable 
system with states by having the state as the first letter of the unbounded store. We use 
prefix-recognizable systems with control states for the sake of uniform notation. 
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The rewrite system R starting in configuration (go, a^o) induces the labeled transi- 
tion graph Q xV* ^ V , pn, (qQ,xo)}. The states of G/j are the config- 

urations of R and ((g, z), {q' , z')) G pR if there is a rewrite rule t G T leading from 
configuration (g, z) to configuration (g', z'). Formally, if i? is a pushdown system, then 
Pniig, A ■ y), (g', x ■ y)) if (g, A, x, q') G T; and if i? is a prefix-recognizable system, 
then pR{{q,x ■ y), {q',x' ■ y)) if there are regular expressions a, [3, and 7 such that 
x G a, y G /3, x' G and (g, a, f3, 7 , g') G T. Note that in order to apply a rewrite 
rule in state (g, z) G Q x V* of a pushdown graph, we only need to match the state 
g and the first letter of z with the second element of a rule. On the other hand, in an 
application of a rewrite rule in a prefix-recognizable graph, we have to match the state 
g and we should find a partition of z to a prefix that belongs to the second element of 
the rule and a suffix that belongs to the third element. A labeled transition graph that is 
induced by a pushdown system is called a pushdown graph. A labeled transition system 
that is induced by a prefix-recognizable system is called a prefix-recognizable graph. 

Consider a prefix-recognizable system R = {S, V, Q, L, T). For a rewrite rule ti = 
( s, ai, (3i, 7 *, s') GT, \&tUx = (V,Qx,ql,Vx,Fx), for A G {ai,/ 3 », 7 j, be the non- 
deterministic automaton for the language of the regular expression A. We assume that 
all initial states have no incoming edges and that all accepting states have no outgoing 
edges. We collect all the states of all the automata for a, f3, and 7 regular expressions. 
Formally, Qa — Qp — QPi^ and Qji' 

We define the size 1 1 T | j of T as the space required in order to encode the rewrite rules 
in T and the labeling function. Thus, in a pushdown system, ||r|j = ^ ^ g')eT 1^1’ 

and in a prefix-recognizable system, ||T|| = Y.{q,a,p,-y,q')(^T \^a\ + \Up\ + 

We are interested in specifications expressed in the /x-calculus [Koz83] and in LTL 
[Pnu77]. For introduction to these logics we refer the reader to [Eme97]. We want to 
model check pushdown and prefix-recognizable systems with respect to specifications 
in these logics. We differentiate between local and global model-checking. In local 
model-checking, given a graph G and a specification ip, one has to determine whether 
G satisfies ip. In global model-checking we are interested in the set of configurations s 
such that G® satisfies p. As G is infinite, we hope to find a finite representation for this 
set. It is known that the set of configurations of a prefix-recognizable system satisfying 
a monadic second-order formula is regular [Cau96,Rab72], which implies that this also 
holds for pushdown systems and for /r-calculus and LTL specifications. 

In this paper, we extend the automata-theoretic approach to model-checking of se- 
quential infinite state systems [KV00,KPV02] to global model-checking. Our model- 
checking algorithm returns a nondeterministic finite automaton on words (NFW, for 
short) recognizing the set of configurations that satisfy (not satisfy, in the case of LTL) 
the specification. The complexity of our algorithms matches the previously known up- 
per bounds [EHRS00,EKS01,Cac02]2. 

Theorem 1. Global model-checking for a system R and a specification p is solvable 

- in time (||T||)^ • and space (||T||)^ • where R is a pushdown system 

and p is an LTL formula. 

^ In order to obtain the stated bound for prefix-recognizable systems and LTL specifications one 
has to combine the result in [EKSOl] with our reduction from prefix-recognizable systems to 
pushdown systems with regular labeling [KPV02]. 
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- in time (||T|| and space (||T'||)^ • where R is a prefix- 

recognizable system and p is an LTL formula. 

— in time where R is a prefix-recognizable system and p is a p-calculus 

formula of alternation depth k. 

Alternating Two-Way Automata. Given a finite set T of directions, an T -tree is a set 
T C T* such that if i; • x € T, where v G T and x € T*, then also x € T. The 
elements of T are called nodes, and the empty word £ is the root of T. For every v gT 
and X G T, the node x is the parent of v ■ x. Each node x e of T has a direction in 
T. The direction of the root is the symbol _L (we assume that _L ^ T). The direction of 
a node r; • x is i;. We denote by dir{x) the direction of node x. An T-tree T is a full 
infinite tree if T = T*. A path tt of a tree T is a set tt C T such that e G tt and for 
every x G tt there exists a unique v G T such that v ■ x G tt. 

Given two finite sets T and S, a S-labeled T-tree is a pair {T, V) where T is an 
T -tree and V : T ^ S maps each node of T to a letter in S. When T and E are not 
important or clear from the context, we call (T, V) a labeled tree. 

A tree is regular if it is the unwinding of some finite labeled graph. More formally, 
a transducer V is a tuple {T, S, Q, qo, tj, L), where T is a finite set of directions, E is 
a finite alphabet, Q is a finite set of states, qo G Q is a start state, t] : Q x T — > Q is 
a deterministic transition function, and L : Q ^ E is a labeling function. We define 
f] : T* ^ Q in the standard way: r]{e) = qo and ri{ax) = p(p(x),a). Intuitively, a 
transducer is a labeled finite graph with a designated start node, where the edges are 
labeled by T and the nodes are labeled by E. A i7-labeled T-tree {T* ,t) is regular if 
there exists a transducer T> = (T, E, Q, qo, rj, L), such that for every x G T*, we have 
t(x) = L{rj{x)). The size of {T* ,f), denoted ||t||, is \Q\, the number of states of V. 

Alternating automata on infinite trees generalize nondeterministic tree automata and 
were first introduced in [MS87]. Here we describe alternating two-way tree automata. 
For a finite set X, let B^{X) be the set of positive Boolean formulas over X (i.e.. 
Boolean formulas built from elements in X using A and V), where we also allow the 
formulas true and false, and, as usual, A has precedence over V. For a set F C A 
and a formula 9 G B'^{X), we say that Y satisfies 9 iff assigning true to elements in 
Y and assigning false to elements in A \ F makes 9 true. For a set T of directions, 
the extension of T is the set ext{T) = T U {e, t} (assuming T n {£, t} = 0 )- An 
alternating two-way automaton over A-labeled T -trees is Al = {E, Q, qo, S, F), where 
E is the input alphabet, Q is a finite set of states, go G <5 is an initial state, 6 : Qx E ^ 
B^{ext{T) X Q) is the transition function, and F specifies the acceptance condition. 

A run of an alternating automaton A over a labeled tree (T*, V) is a A^-labeled 
T-tree, for some set T of directions, where Er = T* x Q. The root e of is labeled 
by {e, qo). The labels of a node and its successors have to satisfy the transition function: 
Consider y G with r{y) = (x,g) and 5{q,V{x)) = 9. Then there is a (possibly 
empty) set S C ext{T) x Q, such that S satisfies 9, and for all (c, q') G S, there is 
7 G T such that 7' y G and the following hold: (a)Ifc G T, thenr(7-y) = (c-x, g'). 
(b) If c = £, then r{y ■ y) = (x, g'). (c) If c =t, then x = v ■ z, for some v G T and 
z G T*, and r(7 • y) = {z, q'). 

A run (Tr, r) is accepting if all its infinite paths satisfy the acceptance condition. 
We consider here parity acceptance conditions [EJ91]. A parity condition over a state 
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set Q is a finite sequence F = {Fi, F 2 , . . . , Fm} of subsets of Q, where Fi C F2 C 
• • • C Fm = Q. The number m of sets is called the index of A. Given a run (T^, r) 
and an inhnite path tt C Tr, let inf{Tr) C Q be such that q G inf{F) if and only if 
there are infinitely many y G tt for which r{y) G T* x {q}. That is, inf^ir) is the set 
of states that appear inhnitely often in tt. A path tt satisfies the condition F if there is 
an even i for which m/(7r) n 7^ 0 and m/(7r) n Fi_i = 0. An automaton accepts a 
labeled tree if and only if there exists a run that accepts it. We denote by C{A) the set 
of all 27-labeled trees that A accepts. The automaton A is nonempty iff C{A) 7 ^ 0. The 
Biichi condition F C Q is equivalent to the parity condition (0, F, Q) [Biic62]. A path 
TT satishes the Biichi condition F iff inf{Tr) n F 7 ^ 0. 

The size of an automaton is determined by the number of its states and the size of 
its transition function. The size of the transition function is \rj\ = Eq^QEaes\v{q, o)| 
where, for a formula in B~^{ext{T) x Q) we define |(Z\, g)| = |true| = |false| = 1 
and 1 01 V 02 1 = 1 01 A 02 1 = |0i| + |02| + 1- 

We say that A is one-way if S is restricted to formulas in F+ (T x Q) , it is advancing 
if 6 is restricted to F+((7’U{e}) x Q). We say that A is nondeterministic if its transitions 
are of the form Vie/ A^er(^’ such cases we write 6 : Q x S ^ 2*^'^ . In the 

case that |T| = 1, is a word automaton. 

Given an alternating two-way parity tree automaton A with n states and index k, 
we can construct an equivalent nondeterministic one-way parity tree automaton whose 
number of states is exponential in nk and whose index is linear in nk [Var98], and we 
can check the nonemptiness of A in time exponential in nk [EJS93]. The membership 
problem of an automaton A and a regular tree {T* , r) is to determine whether A accepts 
(T* ,t)\ or equivalently whether {T*,t) G c\a). For q G Q and w G T*, we say that 
A accepts {T*,t) from {q, w) if there exists an accepting run of A that starts from state 
q reading node w (i.e. a run whose root is labeled by {w, q) and satisfies the transition). 
The global membership problem of A and regular tree {T* , r) is to determine the set 
{(g, w) I A accepts (T*, r) from (q, m)}. 

We use acronyms in {1,2} x {A, N} x {B, P} x {T, IF} to denote the different 
types of automata. The first symbol stands for the type of movement: 1 for 1-way or 
advancing automata (we often omit the 1) and 2 for 2- way automata. The second symbol 
stands for the branching mode: A for alternating and N for nondeterministic. The third 
symbol stands for the type of acceptance: B for Biichi and P for parity, and the last 
symbol stands for the object the automaton is reading: W for words and T for trees. For 
example, a 2APT is a 2-way alternating parity tree automaton and an NBW is a 1-way 
nondeterministic Biichi word automaton. 

Alternating Automata on Labeled Transition Graphs. Consider a labeled transition 
graph G = {F, S, L, p, sq). Let A = {e, □, O}. An alternating automaton on labeled 
transition graphs {graph automaton, for short) [Wil99]^ is a tuple S = {S, Q, qo, S, F), 
where S, Q, qo, and F are as in alternating two-way automata, and S : Q x B ^ 
B^{A X Q) is the transition function. Intuitively, when S is in state q and it reads a 
state s of G, fulhlling an atom (O, t) (or Of, for short) requires S to send a copy in 
state t to some successor of s. Similarly, fulhlling an atom Ot requires S to send copies 



^ See related formalism in [JW95]. 
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in state t to all the successors of s. Thus, graph automata cannot distinguish between 
the various successors of a state and treat them in an existential or universal way. 

Like runs of alternating two-way automata, a run of a graph automaton S over a 
labeled transition graph G = {E, S, L, p, sq) is a I7r-labeled F -tree (T^, r), where F is 
some set of directions, Er = S x Q. The root e of is labeled by (sq, Qo). The labels 
of a node and its successors have to satisfy the transition function: Consider y with 
r(y) = (s, q) and S{q, L{s)) = 9. Then there is a (possibly empty) set S' C Z\ X Q, 
such that S satisfies 9, and for all (c, q') G S, we have: (a) If c = e, then there is 7 G T 
such that 7 • y G Tr and r (7 • y) = (s, q'). (b) If c = □, then for every successor s' of 
s, there is 7 G C such that ■y ■ y G Tr and r(j ■ y) = (s', q'). (c) If c = O, then there is 
a successor s' of s and 7 G T such that y - y GTr and r{y ■ y) = (s', q'). Acceptance is 
defined as in 2APT runs. The graph G is accepted by S if there is an accepting run on 
it. We denote by £{S) the set of all graphs that S accepts and by 5^ = {E, Q, q, S, F) 
the automaton S with q as its initial state. 

We use graph automata as our branching time specification language. A labeled 
transition graph G satisfies a graph automaton S, denoted G ^ 5, if 5 accepts G. 
Graph automata have the same expressive power as the /i-calculus. Formally, given a p.- 
calculus formula ip, of length n and alternation depth k, we can construct a graph parity 
automaton such that £{S^) is exactly the set of graphs satisfying ip. The automaton 

has n states and index k [Wil99]. 

We use NEW as our linear time specification language. A labeled transition graph 
G satisfies an NEW N, denoted G |= W, if 7^^ n L{N) ^ 0 (where sq is the initial 
state of GY. We are especially interested in cases where E = 2^^, for some set AP of 
atomic propositions AP, and in languages L C (2^^)‘^ definable by NEW or formulas 
of the linear temporal logic LTL [Pnu77]. For an LTL formula (p, the language of ip, 
denoted L{p), is the set of infinite words that satisfy p. For every LTL formula p, we 
can construct an NEW N^p with states such that L{Np) = L{p) [VW94]. 

Given a graph G and a specification S, the global model-checking problem is to 
compute the set of configurations s of G such that G® |= 5. Whether we are interested 
in branching or linear time model-checking is determined by the type of automaton. 

3 Global Membership for 2APT 

In this section we solve the global membership problem for 2APT. Consider a 2APT 
A = {E, S, So, p, Oi) and a regular tree T = (T*,t). Our construction consists of a few 
stages. First, we modify A into a 2APT A' that starts its run from the root of the tree 
in an idle state. In this idle state it goes to a node in the tree that is marked with a state 
of A. From that node, the new automaton starts a fresh run of A from the marked state. 
We convert A' into an NPT V [Var98]. Second, we combine V with an NET T>' that 
accepts only trees that are identical to the regular tree T and in addition have exactly 
one node marked by some state of A. We check now the emptiness of this automaton 
A" ■ From the emptiness information we derive an NFW N that accepts a word w G T* 

Notice, that our definition dualizes the usual definition for LTL. Here, we say that a linear time 
specification is satisfied if there exists a trace that satisfies it. Usually, a linear time specification 
is satisfied if all traces satisfy it. 




394 



Nir Piterman and Moshe Y. Vardi 



in state s G S (i.e. the run ends in state s of A; state s is an accepting state of N) iff A 
accepts T from {s,w). 

Theorem 2. Consider a 2APT A = {2J, S, Sq, p, a) and a regular tree T = (T*, r). 
We can construct an NFW N = (T, R' U 5, rg, A, S) that accepts the word w in state 
s G S iff A accepts T from (s, w). Let n be the number of states of A and h its index; 
the NFW N is constructible in time exponential in nh and polynomial in | |t| |. 

Proof Let S+ = S {_L} and T = {v \, . . . , Vk}- Consider the 2APT A' = {E x 
5'+, S' , Sg, p' , a') where S' = S C {sg}, Sg is a new initial state, a' is identical to a 
except having Sg belonging to some odd set of a' , and p' is dehned as follows. 

[ P{s, a) sf- s'o 

p'(s,(CT,f)) = <^ V^er('*^>So) s = s'gandf = _L 

[ V„Gr('*^> s'o) V (e, s') s = s'g and t = s' 

Clearly, A' accepts a (27 x S'-!.) -labeled tree T' iff there is a node x in T' labeled by 
(cr, s) for some (a, s) G E x S and A accepts the projection of T' on E when it starts 
its run from node x in state s. Let 7^ = (E x S+, P,po, pi, ai) be the NPT that accepts 
exactly those trees accepted by A' [Var98]. If A has n states and index h then V has 
states and index 0{nh). 

Let V = (T, E, Q, qo, rj, L) be the transducer inducing the labeling r of T. We 
construct an NET V that accepts {E x S'+)-labeled trees whose projection on E is 
T and have exactly one node marked by a state in S. Consider the NET V = {E x 
Q X {_L, T}, (gg, _L), rj' , Q x {T }) where p' is defined as follows. For g e Q let 
pend^{q) = {{p{q,vi),T), (p{q,Vi),E), {p{q,Vk),T)) be the tuple where the 
j-th element is the -successor of g and all elements are marked by T except for the 
Lth element, which is marked by _L. Intuitively, a state (g, T) accepts a subtree all of 
whose nodes are marked by _L. A state (g, _L) means that D' is still searching for the 
unique node labeled by a state in S. The transition to pendi means that D' is looking 
for that node in direction Vi G T. 

( ct , 7 )) = 

f vi),T), ... ,{p{q,Vk),T))} /3 = T, 7 = _L and cr = L(g) 

J {((??(<?,t'i),T),...,(p(g,i;fe),T))}/3 = _L, jG S and a = L{q) 

I {pendi{q) \ i G [l-.fc]} /3 = 7 = _L and a = L{q) 

[ 0 Otherwise 

Clearly, V accepts a (27 x S'+)-labeled tree T' iff the projection of T' on 27 is exactly 
T and all nodes of T' are labeled by _L except one node labeled by some state s G S. 
Let A" = (27 X S'+, R, tq, 6, af) be the product of V and V where R = {Q x 
X P,rQ = ((go, _L),po), S is defined below and a 2 = {F[, . . . , F'ff) is obtained 
from Qfi = (Fi , . . . , Fm) by setting F[ = {{Qx {_L, T}) x Fi) U (Q x {_L} x P) and for 
i > 1 we have = (<3 x {T }) x Fi. Thus, _L states are visited finitely often, and only 
the state of 7^ is important for acceptance. For every state {{q,(3),p) G (Qx{E,T})xP 
and letter (a, j) G E x S+ the transition function S is defined by: 



<5(((9,/3),p), (o-,7)) = 

|(((9i,/3i),Pi), ■ • • . ((Qk,/3k),Pk)) 



(pi,...,Pk) G pi{p, (cr, 7 )) and 
((gi,/7i),...,(gfc,/7fe)) G //'((g, /7), (ct, 7 )) 
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A tree T' accepted by has a unique node x labeled by a state s of A, all other nodes 
are labeled by _L, and if T is the projection of T' on E then A accepts T from (s, x). 

The number of states of A!' is |jr|| • (n h)OAh) jj-j index is 0{nh). We can 
check whether A!' accepts the empty language in time exponential in nh. The emptiness 
algorithm returns the set of states of A!' whose language is not empty [EJS93]. From 
now on we remove from the state space of A!' all states whose language is empty. Thus, 
transitions of A!' contain only tuples in which all states have non empty language. 

We are ready to construct the NFW N . The states of N are the states of A!' in (Q x 
{_L}) X P in addition to S (the set of states of A). Every state in S is an accepting sink 
of N. For the transition of N we follow transitions of _L-states. Once we can transition 
into a tuple where the _L is removed, we transition into the appropriate accepting states. 

Let N = {T, R' U S', To , A, S), where R' = RD {Q x {_L} x P), ro is the initial state 
of A!' , S is the set of states of A (accepting sinks in N), and A is defined as follows. 
Consider a state {{q, -L),p) € R'. For every tuple ( ((gi, T),pi), . . ., ((gj, ±),pi), . . ., 
(( 9 fc,T),pfe) ) in (i(((g,_L),p),(P(g),_L)), we add ((g,,_L),pi) to A{{{q, ±),p),Vi). 
For every tuple (((gi, T),pi), . . . , {{qk,T),pk)) in <5(((g, -L),p), (L(g), s)), we add s 
to A{{{q, -L),p), e). In the full version we show that N accepts w G T* in a state s G S 
iff A accepts T from {w, s). 



4 Two-Way Path Automata on Trees 

Path automata on trees are a hybrid of nondeterministic word automata and nondeter- 
ministic tree automata: they run on trees but have linear runs. Here we describe two- 
way nondeterministic Biichi path automata. We introduced path automata in [KPV02], 
where they are used to give an automata-theoretic solution to the local linear time 
model checking problem. A two-way nondeterministic Biichi path automaton (2NBP, 
for short) on 27-labeled T-trees is a 2ABT where the transition is restricted to disjunc- 
tions. Formally, S = (27, P,po, 6, F), where 27, P, po, and F are as in an NBW, and 
6 : P X E ^ is the transition function. A path automaton that visits the 

state p and reads the node x € T chooses a pair {d,p') G S{p, t{x)), and then follows 
direction d and moves to state p' . It follows that a run of a 2NBP on a labeled tree 
(T*,t) is a sequence of pairs r = (xo,Po), (xi,pi), ■ ■ ■■ The run is accepting if it visits 
F infinitely often. As usual, £(5) denotes the set of trees accepted by S. 

We studied in [KPV02] the emptiness and membership problems for 2NBP. Here, 
we consider the global membership problem of 2NBP. Consider a 2NBP S = { E, P, 
Po, S, F ) and a regular tree (T*, r). Just like in the case of 2APT, we first modify S 
into a 2NBP S' that starts its run from the root of the tree in an idle state. In this idle 
state it goes to some node in the tree. From that node, the new automaton starts a fresh 
run of S from some state in P. We then construct an ABW A such that A accepts the 
word a“ iff S' accepts the tree T. We check the emptiness of A, and from the emptiness 
information derive an NFW N that accepts a word w G T* in state p G P (i.e. the run 
ends in state p of 5; state p is an accepting state of TV) iff S accepts (T*, r) from (p, w). 

Theorems. Consider a 2NBP S = {E, P,pq,Sj F) <^^d. a regular tree (T*,t). We 
can construct an NFW N = {T, Q' U P, go. A, P) that accepts the word w in a state 
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p £ P iffS accepts T from {p, w). We construct N in time 0(|Pp • |(5| • |jr|| ) and space 

0{\P?-\\r\\). 

Proof. Consider the 2NBP S' = (27, P',po, 5', F) where P' = P LI {po} and po f: P 
is a new state, for every p £ P and cr € 27 we have S'{p,a) = S{p,a), and for 
every ct G 27 we have S'{po, cr) = V^6t(po, v) V \Jp^p{e,p). Thus, S' starts reading 
(T* , r) from the root in state pq, the transition of po includes either transitions down the 
tree that remain in state pq or transitions into one of the other states of S. Thus, every 
accepting run of S' starts with a sequence {po,wo),{po,wi), . . . ,{po,Wn),{p,Wn), — 
Such a run is a witness to the fact that S accepts (T*,t) from {p, Wn). We would like 
to recognize all words w £ T* and states p' £ P for which there exist runs as above 
with p = p' and Wn = w. 

Consider the regular tree {T* , t). Let T>r be the transducer that generates the labels 
of r where T>t = (T, 27, Dr, df, pr, Lr). For a word w £ T* we denote by Pr{w) 
the unique state that T>r gets to after reading w. In [KPV02] we construct the ABW 
A = ({a},Q,qo,v,F') where Q = (P'U(P' xP')) x Dr x {_L,T},go = (Po,d°,±), 
and F' = {F X Dr x {_L}) U {P' x Dr x {T }). We use the following definitions. Two 
functions fa'P'xP'^ {_L, T } where a £ {_L, T }. 



f±{p,q) = A fT{p,q) 



F if p£F or q£F 
T otherwise 



For every state p £ P' and letter ct G 27 the set Cf is the set of states from which p 
is reachable by a sequence of e-transitions reading letter ct and one final ] -fransition 
reading ct. Now rj is defined for every sfafe in Q as follows. 

Vp'6P' V/3 g{-L,t}(c(p,p'.c?, /?)) A {e,{p',d, (3)) 
r]{p,d,a) = y V^GrV(^.p'>G5'(p,L,(d))(a:(p'.PT(d,i^),-L)) 
V(e,p')eY(p,L^(d))(^i -L)) 



r]{Pi,P2, d,a) = 

V(e.p'>e-5'(pi,L^(d))(c(p',P2,d,/a(p',P2))) 
ww W f{e,{Pi,P',d,fp,{pi,p')))A\ 

V Vp'GP' V/3i+/32=a (e,{p',p2,d,fp^{p',P2))) ) 

V„er.(^.p'>e5'(pi.L.(d)) \/ p„^c^rw{a,{p' ,p" , pr{d,v), U{p' ,p"))) 

Finally, we replace every sfafe of the form {{p,p,d,a) | p G F or a = _L} by true. 

In [KPV02] we show that C{A) f 0 iff {T* , r) G C(5') by translating an accepting 
run of S' on {T* , t ) into an accepting run tree of A on a“ and vice versa. As shown 
in [KPV02] the number of states of A is 0(|Pp • ||r||) and the size of its transition 
is 0(|(5| • |Pp • ||r||). It is also shown there that because of the special structure of A 
its emptiness can be computed in space 0{\P\^ ■ |jr||) and in time 0(|(5| • |Pp • ||r||). 
From the emptiness algorithm we can get a table T : Q ^ {0, 1} such that T{q) = 1 
iff L{A'^) 7 ^ 0. Furthermore, we can extract from the algorithm an accepting run of A'^ 
on a“. It follows that in case (p, d,a) £ P x Dr x {_L, T} the run is infinite and the 
algorithm in [KPV02] can be used to extract from it an accepting run of P on the regular 
tree (T*, r^). If (p,p',d,a) £ P x P x Dr x {_L, T} the run is finite and the algorithm 
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in [KPV02] can be used to extract from it a run of P on the regular tree (T*, Td) that 
starts in state p and ends in state p' both reading the root of T* . 

We construct now the NFW N. Let N = {T, Q' UP,qo, A, P) where Q' = ({po} U 
({Po} X P)) X Dt X {_L, T } and P is the set of states of S (that serves also as the set of 
accepting states), qg = _L) is the initial state of A, and A is defined as follows. 

Consider a state {po,d,<P) G Q' . For every v G T such that the language of 
{po,Pr{d,v),±) is not empty, and {(a, (po, Pr(((, w), -L))} h r]{{po,d,a)), we add 
(PO) Prid, v), _L) to A{{po, d, a), v). For every state p G P such that the language of 
(po,p, d, P) is not empty, the language of (p, d, P) is not empty, and {(e, (po,p, d, P)), 
{e,p, d,P)} 1= rj{{po,d, a)), we add (po,p, d, P) to A{{po, d, a),e). For every state p G 
P such that the language of (p, d, _L) is not empty, and {(e, (p, d, _L))} \= p((po, d, a)), 
we add (the accepting state) p to A{{po, d, a),e). 

Consider a state (po, p, d, a) G Q' . For every v G T and for every p' G such 

that the language of (po, p', Pr {d, v ) , /„ (po ,p')) is not empty, and { (a, (po, p', Pr {d, v), 
/a(po,p')))} h v{{Po,P,d,a)), we add the state {po,p' , Pr{d,v), fa{po, p')) to the 
transition A{{po,p',d,a),v). For every p' such that the language of (p',p, d, fa{p',p)) 
is not empty, and {(e, (p',p, d, fa{p',p)))} \= v{{po,P, d, a)), we add the state p' to 
the transition A{{po,p,d,a),e). For every state p' G P such that the language of 
{po,p' ,d, Pi) is not empty, the language of {p' ,p, d, P 2 ) is not empty, and we have 
{(e, (po,p',d,Pi)), {e, {p',p,d,P))} |= rj{{po,p,d,a)),we add the state (po,p',d,Pi) 
to the transition A{{po, p, d, a),s). 

This completes the definition of the automaton. In the full version we prove that N 
accepts m G T* in a state p G P iff S accepts {T* , t) from (p, w) . 

5 Global Model Checking 

In this section we solve the global model-checking problem by a reduction to the global 
membership problem. The constructions are somewhat different from the constructions 
in [KV00,KPV02] as we use the global-membership of 2APT and 2NBP instead of 
the emptiness of 2APT and membership of 2NBP. We start with branching time model 
checking and then proceed to linear time. 

Consider a rewrite system i? = (P, V, Q, L, T). Recall that a configuration of R 
is a pair {q, x) G Q y. V* . Thus, the store x corresponds to a node in the full infinite 
V -tree. An automaton that reads the tree V* can memorize in its state space the state 
component of the configuration and refer to the location of its reading head in V* as the 
store. We would like the automaton to “know” the location of its reading head in C* . A 
straightforward way to do so is to label a node a; G C* by x. This, however, involves an 
infinite alphabet. We show that labeling every node in V* by its direction is sufficiently 
informative to provide the 2APT with the information it needs in order to simulate 
transitions of the rewrite system. Let (C* ,Ty) be the tree where (x) = dir{x) . 

Theorem 4. Given a pushdown or a prefix-recognizable system R = (27, V, Q, L, T) 
and a graph automaton W = (27, VF, wq , S, F), we can construct a 2APT A on V - 
trees and a function f that associates states of A with states of R such that A accepts 
(V*,Ty) from (p,x) \= W. The automaton A has 0{\Q\ ■ ||T|j • |1X|) states, 

and has the same index as W. 
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The construction in Theorem 4 reduces the global model-checking problem to the 
global membership problem of a 2APT. By Theorem 2, we then have the following. 

Theorem 5. Global model-checking for a pushdown or a prefix-recognizable system 
R = (27, y, Q, L, T) and a graph automaton W = (27, W, wq, <5, F), can be solved in 
time exponential in nk, where n = \Q\ ■ ||T|| • \ V\ and k is the index ofW. 

According to [Wil99], we can conclude with an EXPTIME bound also for the 
global model-checking problem of /i-calculus formulas, matching the lower bound in 
[Wal96]. Note that the fact the same complexity bound holds for pushdown and prefix- 
recognizable rewrite systems stems from the different definition of | j T | j in the two cases. 

We now turn to linear time specifications. As branching time model-checking is ex- 
ponential in the system and linear time model-checking is polynomial in the system, we 
do not want to reduce linear time model-checking to branching time model-checking. 

As before, the 2NBP reads the full infinite V -tree. It uses its location as the store 
and memorizes as part of its state the state of the rewrite system. Eor pushdown systems 
it is sufficient to label a node in the tree by its direction. For prefix-recognizable systems 
we label a node x, in addition to its direction, by the regular expressions /? for which 
X & (3. We denote this tree by {V* , rf) and its size is exponential in \Qis\. 

Theorem 6. Given a pushdown or a prefix-recognizable system R = (27, V, Q, L, T) 
and an NEW N = (27, W, Wo,rj, F), we can construct a 2NBP S on V -trees and a 
function f that associates states of S with states of R such that S accepts {V*,Ty) 
(or {V* ,Tff)) from (s,x) |= N. The size of the transition function of S is 

0(||T|| • |7V|) and it has 0(|<5| • HTH • | A^|) states in the case of pushdown systems, and 
0{\Q\ ■ i\Qa\ + |<57l) • |r| • |iV|) states in the case of prefix-recognizable systems. 

Combining Theorem 3 and Theorem 6 we get the following. 

Theorem 7. Global model-checking for a rewrite system R and NEW N is solvable in 
time 0((||T|| • |iV|)^) and space 0((||T|| • |7V|)^) when R is a pushdown system and 
in time (||r|| • \N\)^ • and space (|T| • |7V|)^ • when R is a prefix- 

recognizable system. 

Our complexity coincides with the one in [EHRSOO], for pushdown systems, and 
with the result of combining [EKSOl] and [KPV02], for prefix-recognizable systems. 

6 Conclusions 

We have shown how to extend the automata-theoretic approach to model-checking in- 
finite state sequential rewrite systems to global model-checking. In doing so we have 
shown that the restriction of automata-theoretic methods to local model-checking is not 
an inherent restriction of this approach. Our algorithms generalize previous automata- 
theoretic algorithms for local model-checking [KV00,KPV02]. The complexity of our 
algorithm matchs the complexity bounds of previous algorithms for global model- 
checking [EHRS00,EKS01,KPV02,Cac02] and show that a uniform solution exists in 
the automata-theoretic framework. 
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We believe that our algorithms generalize also to micro-macro stack systems [PV03] 

and to high order pushdown systems [KNU03,Cac03] as the algorithms for local model- 
checking over these types of systems are also automata- theoretic. 
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Abstract. We study the problem of formally verifying shared memory 
multiprocessor executions against memory consistency models - an im- 
portant step during post-silicon verification of multiprocessor machines. 
We employ our previously reported style of writing formal specifications 
for shared memory models in higher order logic (HOL), obtaining in- 
tuitive as well as modular specifications. Our specification consists of a 
conjunction of rules that constrain the global visibility order. Given an 
execution to be checked, our algorithm generates Boolean constraints 
that capture the conditions under which the execution is legal under the 
visibility order. We initially took the approach of specializing the mem- 
ory model HOL axioms into equivalent (for the execution to be checked) 
quantified boolean formulae (QBF). As this technique proved inefficient, 
we took the alternative approach of converting the HOL axioms into a 
program that generates a SAT instance when run on an execution. In 
effect, the quantifications in our memory model specification were real- 
ized as iterations in the program. The generated Boolean constraints are 
satisfiable if and only if the given execution is legal under the memory 
model. We evaluate two different approaches to encode the Boolean con- 
straints, and also incremental techniques to generate and solve Boolean 
constraints. Key results include a demonstration that we can handle exe- 
cutions of realistic lengths for the modern Intel Itanium memory model. 
Further research into proper selection of Boolean encodings, incremen- 
tal SAT checking, efficient handling of transitivity, and the generation of 
unsatisfiable cores for locating errors are expected to make our technique 
practical. 



1 Introduction 

In many areas of computer design, formal verification has virtually eliminated 
logical bugs escaping into detailed designs (including silicon). However, in ar- 
eas where the system complexity is high, and global interactions among large 
collections of subsystems govern the overall behavior, formal verification cannot 
yet cope with the complex models involved. The verification of multiprocessor 
machines for conformance to shared memory consistency models [1] is one such 
area. This paper focuses on verifying whether multiprocessor executions violate 
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memory ordering rules. These executions may be obtained from multiprocessor 
simulators or from real machines. The current practice is to employ well-chosen 
test programs to obtain a collection of “interesting” executions from machines 
and simulators. These executions are then examined using ad hoc “checker” pro- 
grams. Our contribution is to make the second step formal. 

It is crucially important that multiprocessor machines and simulators con- 
form to their memory models. Future high performance operating systems will 
exploit relaxed memory orderings to enhance performance; they will fail if the 
multiprocessor deviates from its memory model. However, as far as we know 
from the literature, none of the existing methods can verify executions against 
formal descriptions of industrial memory models. A tool such as what we propose 
can also help designers comprehend a given memory model by executing critical 
code fragments. Given that industrial memory models are extremely complex, 
an efficient execution verification facility is very important in practice. 

In this paper, we show that Boolean satisfiability (SAT) based tools can be 
developed for verifying executions of realistic lengths. Our current work is aimed 
at the Intel Itanium memory model [2]; the technique is, however, general. Given 
a shared memory multiprocessor execution, e, and a formal specification of the 
memory model as logical axioms, r, we offer a formal technique to verify whether 
e is allowed under r. By the term execution^ we mean multiprocessor assembly 
programs over loads, stores, fences, and other memory operations, with the loads 
annotated with returned values. The actual assembly program run on a machine 
may consist of instructions other than loads and stores; it may, for instance, 
include branches and arithmetic operations. In those cases, e retains a dynamic 
trace of just the load and store group of instructions, with the loads annotated 
with their returned values. In this paper, we will depict e in the form of assembly 
programs consisting of only load and store instructions, with the loads annotated 
with the returned values (such annotated programs are called “litmus tests”). 
We do not discuss here how such dynamic traces can be obtained. 

Gibbons and Korach [3] have shown that the problem of checking execu- 
tions against sequential consistency [4] is NP-complete. Generalizing this result, 
Gantin [5] has shown that the problem of checking executions against memory 
ordering rules that contain coherence as a sub-rule is NP-hard. Since the Itanium 
memory model contains coherence, and since executions serve as polynomial cer- 
tificates that can be checked against the rules in polynomial time, we have an 
NP-complete problem at hand. Despite these results, we initially found it nat- 
ural to employ a quantified boolean formula (QBF) [6] satisfiability checker. 
This is because of two facts: (i) the Itanium memory model is quite complex, 
and to write a formal specification in a declarative and intuitive style, we em- 
ployed higher order logic (HOT) [7-9]; (ii) since we wanted to have a trustworthy 
checking algorithm, we took the approach of specializing the HOT description 
to a QBF description so as to check e. This specialization is natural, given that 
r captures the memory model in terms of quantifiers that range over program 
counters, addresses, and data values that have arbitrary ranges, whereas e has 
these quantities occurring in it over a finite range. However, the direct use of a 
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PO: St a,l; Id rl,a <1>; st b,rl <1>; 

PI: Id.acq r2,b <1>; Id r3,a <0>; 



[{id=0;proc=0; pc=0; op=St; var=0; data=l; wrID=0; 

wrType=Local; wrProc=0; reg=-l; useReg=false}; 

{id=l; proc=0; pc=0; op=St; var=0; data=l; wrID=0; 

wrType=Remote; wrProc=0; reg=-l; useReg=false}; 

{id=2; proc=0; pc=0; op=St; var=0; data=l; wrID=0; 

wrType=Remote; wrProc=l; reg=-l; useReg=false}; 

{id=3; proc=0; pc=l; op=Ld; var=0; data=l; wrID=-l; 

wrType=-l; wrProc=-l; reg=0; useReg=true}; 

{id=4; proc=0; pc=2; op=St; var=l; data=l; wrID=4; 

wrType=Local; wrProc=0; reg=0; useReg=true}; 

{id=5; proc=0; pc=2; op=St; var=l; data=l; wrID=4; 

wrType=Remote; wrProc=0; reg=0; useReg=true}; 

{id=6; proc=0; pc=2; op=St; var=l; data=l; wrID=4; 

wrType=Remote; wrProc=l; reg=0; useReg=true}; 

{id=7; proc=l; pc=0; op=LdAcq; var=l; data=l; wrID=-l; 

wrType=-l; wrProc=-l; reg=l; useReg=true}; 

{id=8; proc=l; pc=l; op=Ld; var=0; data=0; wrID=-l; 
wrType=-l; wrProc=-l; reg=2; useReg=true} ] 

Fig. 1. The execution of a multiprocessor assembly program, and the tuples it gener- 
ates. 



QBF solver[10] proved to be of impractical complexity. Therefore, we pursue the 
following alternative approach. We first derive a mostly^ applicative functional 
program p from r . Program p captures the quantifications present in r via iter- 
ative loops (tail-recursive calls) . It also stages the evaluation of the conditionals 
in an efficient manner. Such a program p, when run on execution e, evaluates all 
the ground constraints (constraints without free variables) efficiently by direct 
execution, and generates non-ground constraints in the form of a SAT instance 
b which is satisfiable if and only if e is allowed under r. Further we demonstrate 
that the derivation of p can be automated in a straightforward manner. 



Related Work: Park et.al. [11] wrote an operational description of the Sparc 
V9 [12] memory models in Murphi [13] and used it to check assembly language 
executions. It is our experience is that this approach does not scale beyond a 
dozen or so instructions; it is also our experience that specifications for memory 
models as intricate as the Itanium are very difficult to develop using the oper- 
ational approach [14]. Since our HOL specification follows the axiomatic style 
used in Intel’s description [2], it can be more easily trusted. It also can be for- 
mally examined using theorem provers to enhance our confidence in it. In [15], 
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we show that a whole range of memory models can be described in the same 
HOL style as we use here. 

Yu [16] captured memory ordering rules for the Alpha microprocessor [12] as 
first-order axioms. Given an execution to be checked, they generated verification 
conditions for the decision procedure Simplify [17]. We believe that the use of 
SAT for this application will scale better. 

In previous work [18], we presented the higher order logic specification of 
the Itanium memory model and its realization as a constraint logic program. We 
also sketched an approach to generate Boolean satisfiability constraints. Three 
major problems remained: (i) the constraint logic program version was unable 
to handle more than about 20 tuples (a dozen or so instructions); (ii) the SAT 
version was extremely difficult to debug owing to it being retrofitted into a logic 
program; (iii) since the logic program did not exploit the nature of the higher 
order logic axioms, it took far more time to generate SAT instances than to 
solve them - often with a ratio of 200:1. The present work is an improvement 
in all these regards and also offers several new directions. In particular, it offers 
a reliable formal derivation scheme to obtain the SAT instance generation pro- 
gram. This program can handle much longer executions - about 300 tuples. The 
SAT instances generated from such executions can be solved using SAT tools in 
reasonable time, thanks to the care exercised in selecting the Boolean encoding 
method. We have also identified many avenues to scale the capacity of our tool 
further. 

2 Overview of Our Approach 

As a simple example, consider the litmus test shown in Figure 1. Processor PO 
issues a store (st) instruction to location a with data value 1. It then issues a 
load (id) to location a, which fetches the value 1 into register rl (shown via 
the annotation <1>). It then stores the contents of register rl into location b 
(we show the value annotation <1> here also, as we can compute the value in rl 
at this point). Processor PI issues a load acquire (id.acq) instruction to begin 
with. This fetches value 1 from location b into register r2. It then performs 
an ordinary Id instruction, obtaining 0 from location a that is stored into r3. 
The only strongly ordered operation in this whole program is Id . acq. Itanium 
rules require that the visibility of Id . acq must be before the visibility of all 
the instructions following it in program order (i.e., Id.acq acts as a “one-way 
barrier” ) . The question now is: “is this execution legal?” 

Modeling Executions Using Tuples: Following earlier approaches [2, 19], we 
employ a set of tuples to model executions. One tuple is employed to capture 
the attributes of each Id instruction, and p + 1 tuples are employed to model 
the attributes of each st or st . rel instruction, where there are p processors in 
the multiprocessor (Figure 1). In our example, each store generates three tuples, 
giving us a total of nine tuples^. Of the p+l tuples modeling a store (st), one 

^ We have written an assembler to generate tuples from annotated assembly programs. 
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legal(ops) = 

3order. 

order A WriteOperationOrder ops order A 

order A MemoryDataDependence ops order A 

order A Coherence ops order A 

order A AtomicWBRelease ops order A 

order A NoUCBypass ops order 

IrreflexiveOrder ops order 
TransitiveOrder ops order 
TotallyOrdered ops order 

V(z G ops). V(j G ops). (Aid = j.id) border i j 
V(z G ops). V(j G ops). V(/c G ops). 

(order i j A order j k order i k) 

V(z G ops). V(j G ops), (i G ops) A (j G ops). A ^(Aid = j.id) 
order i j \/ order j i 
V(z G ops). V(j G ops). 

ordByAcquire z j V ordByRelease i j V ordByFence i j 

order i j 

ordByProgram i j A (Aop = LdAcq) 

(i.proc = j.proc) A Ape < j.pc 

V(j 6 ops). 

(isRd j 

validLocalWr ops order j 

V validRemoteWr ops order j 

V validDefaultWr ops order j) 

A (isWr j A j.useReg validRd ops order j) 

3(z G ops). isRd i A (Areg = J-reg) A ordByProgram i j 
A(Adata = j.data) 

A -'{3(k G ops). isRd k A (fc.reg = j.reg) 

A ordByProgram z /c A ordByProgram A: j) 

V(z G ops). V(J G ops). V(A: G ops). 

(Aop = StRel) A (z.wrType = Remote) 

A (k.op = StRel) A (A:.wrType = Remote) 

A (AwrID = /c.wrID) A (attr.of z.var = WB) 

A order i j A order j k 

(j.op = StRel) A (j.wrType = Remote) 

A (j.wrID = AwrID) 

Fig. 2. Excerpts from the Itanium Ordering Rules (For the full spec, see [18]). 
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Fig. 3. Illustration of the nlogn (left) and nn (right) methods. 

is a local store and the remaining p are global stores, one for each processor. 
For example, consider the tuples with id=0, id=l, and id=2. These are tuples 
coming from the store instruction of PO (proc=0), have program counter pc=0. 
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employ variable var=0, and have data=l. The wrID=0 says that these store tuples 
come from the store instruction with id=0. To distinguish where these stores are 
observed, we employ the wrProc attribute, the values of which are 0, 0, and 1 
respectively. Notice that the tuple with id=0 has wrType=Local, and the one 
with id=l has wrType=Remote. (“Remote” means “global” in the parlance of 
[2]). Notice that we employ two tuples, namely the ones with id=0 and id=l, 
both for the local processor proc=0 (PO). This is to facilitate modeling the the 
semantics of load bypassing - the ability of a processor to read its own store 
early. For details, please see [2, 18]. 

The modeling details associated with load instructions are much simpler. 
We simply employ one tuple per Id or Id.acq instruction. The useReg field 
captures whether a register is involved, and the reg field indicates which register 
is involved. All fields with -1 are don’t-cares. 



Overview of the Itanium Ordering Rules: Figure 2 provides roughly a 
fourth of the Itanium ordering rules from our full specification. The legality 
of an execution is checked by legal ops, where ops is the collection of tuples 
obtained from an execution, such as in Figure 1. Note how order, a binary 
relation, is passed around and constrained by all the ordering rules. Basically, 
the definition consists of four distinct parts: (i) StrictTotalOrder, which seeks 
one arrangement of the tuples into a strict total order, (ii) ReadValue, which 
checks that all reads in this strict total order either return the value associated 
with the most recent (in the strict total order) write to the same location, or the 
initial store values, if there is no write to that location, (iii) ItProgramOrder, 
which is weakened program order that orders instructions only if one of them 
is an acquire, a release or a fence, and (iv) all the remaining rules which try to 
recover some modicum of program order. For instance, an instruction i is ordered 
before an instruction j if i is of type Id. acq, as captured by the ordByAcquire 
rule. 

This style of specification, adopted by [2], makes it easier to contrast it with 
sequential consistency. For instance, if we change ItProgramOrder into a reg- 
ular program order relation, and retain ReadValue and StrictTotalOrder, we 
obtain sequential consistency. Since the combination of ItProgramOrder and 
the rules mentioned in (iv) above is weaker than the regular program order rela- 
tion, the Itanium memory model allows more solutions under StrictTotalOrder 
than with regular sequential consistency. Hence the Itanium memory model is 
weaker than sequential consistency. However, the variety of instructions allowed 
under Itanium is more than just load and store. Hence, we can only hope to 
make qualitative comparisons between these models. 



Overview of Boolean Encoding: As far as the relation legal goes, ops of 
Figure 1 is to be viewed as a set of tuples. Notice that StrictTotalOrder seeks 
to arrange the elements of ops into a strict total order such that the remain- 
ing constraints are met (the arrangement of the elements of ops is captured in 
the order relation). Total ordering among n tuples can be encoded using auxil- 
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iary Boolean variables in two obvious ways (Figure 3, also see [20] where these 
are called the small domain and the approaches): (i) the nlogn approach, 
in which a bit- vector of log(n) Boolean variables of the form [ordij-i . . . ordi^] 
are augmented to the fth tuple (example tuples are shown as tO through t3 in 
the figure). Here, n is the number of tuples, assumed to be a power of 2, and 
j = log 2 (n); (ii) the nn approach, in which Boolean variables (denoted by 
matrixij, with 0 < i, j < n) are introduced to represent how tuples are ordered. 
In the nlogn approach, StrictTotalOrder is implemented by the constraint 
[ordi^iog^(^n)-i, ■ ■ ■ ,ordi,o] yf [ordjjog^(„)_i, . . . , ordj,o] for all i yf j. In the nn 
approach, StrictTotalOrder is implemented via its constituents: irreflexive, 
transitive, and totally Ordered. Constraint irreflexive is encoded by set- 
ting the diagonal elements of the matrix to 0. Constraint transitive is en- 
coded by generating the formula {matrixij A matrix jk) ^ matrixik- Constraint 
totallyOrdered is encoded by generating the formula matrixij V matrix ji . 

The size of the formula which encodes StrictTotalOrder for the nn method 
is far greater than for the nlogn method. This is largely because of the transitivity 
axiom where we go through every triple of tuples and generate the transitivity 
clause. We plan to investigate other methods discussed in [20] . One key difference 
between our work and that of [20] is that in their setting, a collection of first- 
order equational formulae (or more generally speaking, formulae in separation 
logic involving =, >, and <) are to be checked for validity. In doing so, transitivity 
is applied over the given set of equations. In our case, we are solving for an order 
over the tuples. The number of these tuples is expected to be far higher. In a 
sense, our method searches for the few permutations of the given sequence of 
tuples that are consistent with the memory ordering rules. We hope to investigate 
lazy approaches to handling transitivity as discussed in Section 5. 

A significant advantage of the nn method over the nlogn method in our con- 
text is that it generates much smaller formulae for the rest of the constraints 
other than transitivity. For example, suppose while processing a memory or- 
dering rule we have to specify that some tuple, say tO, appears before an- 
other tuple, say t3, in any allowed total order. This encoding is achieved by 
[ordoiordoo] < [ordsiordgo] in the nlogn method, while simply achieved by as- 
serting matrixQg in the nn method (see Figure 3 for a ‘F in the matrix). These 
trade-offs are studied in Section 4. In effect, we found that despite the use of 

variables as opposed to nlog{n), the nn method is more efficient during SAT 
checking. Similar results are obtained in [20] where SAT-checking is often faster 
under their method (similar to our nn method) than their small domain 
method (similar to our nlogn method) . 

In post-silicon verification, tests on multiprocessor machines are run multiple 
times in the hope of obtaining different load values due to non-deterministic 
interleavings. This naturally fits with the use of incremental SAT methods for 
execution verification. 
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3 Program Derivation from Memory Ordering Rules 

We provide an example of how one rule of Itanium, namely atomicWB Release, 
is transformed into a program; all other rules are handled similarly. The initial 
specification is in Figure 2. Recall from Section 1 that for every store instruc- 
tion, we generate p + \ stores, of which p are considered ‘remote stores.’ Rule 
atomicWBRelease says that all these remote stores form an ‘atomic packet’ in 
the sense that any other event e is strictly before or strictly after all the events 
in this packet. Notice how it is specified by the following axiom which says: if j is 
an event ‘trapped’ between i and fc, then j also belongs to the atomic packet of 
all remote stores. (A note about our notation: we use the generic order relation 
to denote a total order over the set of tuple operations ops. When it comes to 
specifically generating the Boolean constraints, we choose ord or matrix depend- 
ing on the encoding method used. This difference shows up in Table 1(e) in part 
&2 of the results.) 

We now pre-process this specification by applying the contrapositive rule. 
The general idea is to bring ground constraints to the antecedent so that we 
can evaluate them through direct execution. The SAT instances can then be 
generated from the consequent part. The result of this step is a formula with 
three outermost quantifiers (Figure 4, before Quantifier Scope Reduction). If 
we translate this directly into loops, we will obtain a very inefficient program. 
The Quantifier Scope Reduction step takes advantage of the limited scope of 
various sub-formulae and rewrites the quantified expression into a series of staged 
quantifications. This will allow many iterations of outer loops to be cut-off early, 
thus not suffering from the full brunt of the O(n^) complexity. This dramatically 
reduced our SAT-generation time. For example, i.op = StRel depends only on 
i, and so the inner loops are not called for all those instructions that do not pass 
this test. 

The last stage of our translation {SAT-generation program sketch) obtains 
a series of tail-recursive functions capturing the semantics of the quantified ex- 
pression. Here, foldr reduces a given list of arguments (generated by map) using 
conjunction; this is because conjunction is the explicitly provided second oper- 
ation ‘&’ for foldr. The list that is reduced is obtained by mapping the function 
(fn i e{i)) (a Lambda abstraction) on the given list. Forms such as f{i){j) 
are employed as opposed to f{i,j) to signify currying [21]. The main difference 
between the sketch we provide and the actual Ocaml [22] code we employ is that 
the latter emits constraints on-the-fly to a file instead of building an expression 
tree using foldr as shown in our sketch. 

4 Results 

Our program handles all the 17 litmus tests given in [2] except a few that involve 
partial word writes that are currently omitted. These tests ran considerably 
faster than those in [18]. 

Next, we considered executions with 32, 64, and 128 tuples in our experi- 
ments. The complexity of our algorithm depends primarily on the number of 
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Applying contrapositive 

atomicWBRelease(ops, order) = 

V(i G ops). V(j G ops). V(fc G ops). 

(i.op = StRel) A (i.wrType = Remote) 

A (fc.op = StRel) A (fc.wrType = Remote) 

A (i.wrID = fc.wrID) A (attr_of Avar = WB) 

A -'((j.op = StRel) A (j.wrType = Remote) A (j.wrID = i.wrID)) 

=> ^(order(i, j) A order{j, k)) 

Quantifier Scope Reduction 

atomicWBRelease(ops, order) = 

V(i G ops). 

(i.op = StRel) A (j.wrType = Remote) 

A(attr_of Avar = WB) 

=> V(fc G ops). 

(fc.op = StRel) A (fc.wrType = Remote) A (AwrID = fc.wrID) 

=> V(j G ops). 

-i((j.op = StRel) 

A (j.wrType = Remote) 

A(j.wrID = AwrID)) 

=> ^{order[i, j) A order[j, k) 

SAT- generation program sketch 

atomicWBRelease(ops) = forall(i, ops, wb(i)); 

wb(i) = if(-'((attr_of Avar = WB) & (Aop = StRel) & (AwrType = Remote))) 
then true else forall(fc, ops, wbl(j)(fc)); 

wbl(i)(fc) = if(-'((fc.op = StRel) & (fc.wrType = Remote) & (AwrID = fc.wrID))) 
then true else forall(j, ops, wb2(i)(fc)(j)); 
wb2(i)(fc)(j) = if((j.op = StRel) & (j.wrType = Remote) & (j.wrID = AwrID)) 
then true else ^{order(i, j) order {j, k))', 

forall(i, S, e{i)) = foldr(map(fn i e{i)){S), &, true) 

Fig. 4. Sketch of SAT-generation Program Derivation. 



tuples, and far less on the remaining attributes of the tuples. Thus, checking 
8 tuples over 2 processors has nearly the same complexity as checking 2 tuples 
over 8 processors. We selected the instruction mix heavily skewed towards stores 
to reflect a worst-case behavior (more rules pertain to stores than loads). All 
runs were on an AMD Athlon XP2100-I- CPU (1.733 GHz, 1GB memory, Red 
Hat Linux V.9). We used the Satzoo incremental solver [23]. 

We evaluated two approaches, one generating and solving the constraints 
monolithically, and the other using partial evaluation. To motivate the latter 
approach, note that the constraints generated from TotallyOrdered depends 
on the number of tuples - and not on the contents of the tuples. Capitalizing 
on this fact, we pre-generated the constraints pertaining to TotallyOrdered 
for various lengths; call these constraints bin, where n represents the number of 
tuples anticipated in a test program to be given in future. We then loaded these 
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Table 1. Result Tables. 

(a). SAT generation times for nlogn encoding (parts &i and 62 ) • 



^tuples 


Part bi 


Part &2 




time (secs) 


^vars 


T^^clauses 


time (secs) 


T^ivars 


^clauses 


32 


0.219 


20,992 


68,448 


1.635 


92,316 


258,632 


64 


1.213 


101,184 


330,624 


17.178 


852,632 


2,387,664 


128 


5.748 


472,320 


1,544,320 


179.026 


7,777,200 


21,775,520 



(b) . SAT generation times for nn encoding 



^tuples 


Part h\ 


Part 62 




time (secs) 


^vars 


^clauses 


time (secs) 


#vars 


^clauses 


32 


0.509 


67,552 


233,376 


0.100 


8,044 


22,760 


64 


4.311 


532,416 


1,851,200 


0.967 


63,832 


179,792 


128 


34.255 


4,226,944 


14,745,216 


9.095 


509,104 


1,431,200 



(c). ‘Monolithic’ gives the SAT solver execution time for the full SAT instance. 
Golumn Part h\ gives the SAT time for part h\. Part 62 gives the time for SAT after 
resuming from the checkpoint and adding the new constraints. 



^tuples 


1 nlogn encoding 


1 nn encoding | 




monolithic 


Part bi 


Part 62 


monolithic 


Part bi 


Part &2 


32 


9.61 


0.6 


4.3 


0.33 


0.69 


0.05 


64 


247.17 


29.53 


37.6 


2.73 


6.17 


0.5 


128 


aborted 


1341.85 


aborted 


164.8 


145.64 


351.1 



(d). nlogn encoding: 1-Cl, 2-Cl, and 3-Cl give the percentage of clauses with one, two 
and three literals. 



^tuples 


Part 1 


Part 62 




1-Cl (%) 


2-Gl (%) 


3-Cl (%) 


1-Cl (%) 


2-Cl (%) 


3-Cl (%) 


32 


1.449 


46.376 


52.173 


0.064 


71.387 


28.547 


64 


1.219 


46.341 


52.439 


0.024 


71.419 


28.555 


128 


1.052 


46.315 


52.631 


0.010 


71.430 


28.559 



(e). nn encoding: 1-Cl, 2-Cl, and 3-Cl give the percentage of clauses with one, two and 
three literals. 



^tuples 


Part bi 


Part 62 




1-Cl (%) 


2-Cl (%) 


3-Cl (%) 


1-Cl (%) 


2-Cl (%) 


3-Cl (%) 


32 


14.479 


57.013 


28.506 


0.738 


70.685 


28.576 


64 


14.382 


57.078 


28.539 


0.329 


71.006 


28.664 


128 


14.333 


57.110 


28.555 


0.154 


71.143 


28.702 
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constraints into the SAT solver, and created the checkpoint of a runnable image 
of the SAT solver using the ckpt tool [24]^to obtain ckptn ■ Later, when presented 
with a litmus test of length n, we only generated the remaining constraints 
(other than TotallyOrdered) for it; call the resulting constraints b 2 n- We then 
ran ckptin on b 2 n- Table 1(a) provides the time to generate SAT instances for the 
nlogn encoding method for formula parts bi and 62 . Table 1(b) provides these for 
the nn encoding method. Table 1(c) gives the SAT solving time for the nn and 
nlogn methods for a monolithic run, and for running parts b\ and 62 separately. 

The results show that under the nn encoding, it takes longer to generate SAT 
instances for part b\, but considerably shorter for part 62 . The main reason is 
that in our implementation, the number of clauses nc grows as 7n^ + . . . and the 
number of variables as 2 n^ + . . . (later code improvements have brought down 
nc to + .. .). The SAT solving times are uniformly lower for the nn method. 
This is because of the preponderance of clauses with smaller numbers of literals, 
as shown in Tables 1(d) and 1(e). In particular, part 62 of nn encoding has both 
lower number of clauses and a higher proportion of clauses with 1 or 2 literals 
than the nlogn encoding. To summarize: (i) The nn encoding is better in terms 
of SAT solving time. The SAT generation time is acceptably small till about 128 
tuples, (ii) Verifying in two parts b\ and 62 can be advantageous for problems 
of reasonable sizes. The advantage is far more for the nn approach, (iii) Since 
the same test is re-run multiple times, partial evaluation and other incremental 
SAT techniques can play a crucial role in overall efficiency. 

Recently we have run a more realistic test of 130 assembly language instruc- 
tions^. These expanded into 239 tuples. Initially, since the constraint generation 
program could not handle the transitivity rule, we suppressed it, obtaining a SAT 
instance of 115,637 variables and 164,848 clauses. This SAT instance proved to 
be unsatisfiable. Upon deeper examination using the Zcore program (distributed 
with the latest Zchaff [25]), we discovered an unsatisfiable core of 9 clauses. By 
analyzing these clauses, it was discovered that the error we detected resulted 
from us forgetting to initialize the memory state prior to the test. Further ex- 
periments with these and other realistic tests are underway, and our latest results 
will be presented on our webpage [26]. 

To sum up, proper handling of transitivity is crucial to scale our tool further. 
Recent code optimizations have allowed us to handle this realistic example with- 
out suppressing transitivity. However, the complexity of transitivity still lurks 
- in the 400 tuples and above range as of now. Also, the use of unsatisfiability 
core generation tools can be of considerable help in finding the root cause of 
violations. 

5 Concluding Remarks 

We proposed a method for verifying shared memory multiprocessor executions 
where the reference semantics is that of shared memory consistency. We propose 

® We resorted to binary checkpoints - as opposed to clause checkpoints - because the 
source code of Satzoo was not available. 

^ We are deeply indebted to Intel for providing us this test. 
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a method by which executions can be analyzed using programs that embody the 
shared memory consistency rules. The ground part of the constraints in these 
rules are evaluated by the program, and the non-ground parts are emitted as 
Boolean constraints to check. 

Semaphores are currently omitted to retain focus on the overall scalability 
and usability of our tool. Partial-word writes are also not handled. These ex- 
tensions are planned for the future. A rudimentary assembler has been written 
to generate tuples from value-annotated assembly programs. This assembler can 
model data and address dependencies. The Itanium memory model rules in HOL 
were hand-translated into a series of tail-recursive programs; this process is best 
automated to ensure correctness, using the transformation rules illustrated ear- 
lier. 

If the generated SAT instance is satisfiable, the space of satisfying assign- 
ments will reveal the set of allowed executions. Future work will annotate the 
Boolean constraints (clauses) with the instructions as well as memory ordering 
rules that generate them. This way, if the SAT instance is unsatisfiable, the 
unsatisfiability core will reveal which instructions and which memory ordering 
rules are causing the execution to be invalid. Incremental SAT techniques will be 
of great importance to develop, as are hierarchical analysis methods that treat 
groups of instructions atomically. 

Better methods for handling transitivity are needed. One approach would 
be to see if SAT returns a satisfying instance when transitivity is suppressed, 
and if so to selectively introduce transitivity on those elements corresponding to 
the SAT instance. The ability to analyze symbolic executions (where not all the 
execution results are ground) would also enhance the usability of our tool. 
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Abstract. In this paper we propose a method for the deductive verifica- 
tion of out-of-order scheduling algorithms. We use tlpvs, our PVS model 
of linear temporal logic (ltl), to deductively verify the correctness of a 
model based on the Mips RIOOOO design. Our proofs use the predicted 
values method to verify a system including arithmetic and memory oper- 
ations and speculation. In addition to the abstraction refinement tradi- 
tionally used to verify safety properties, we also use fairness constraints 
to prove progress, allowing us to detect errors which may otherwise be 
overlooked. 



1 Introduction 

Modern out-of-order microprocessors use dynamic scheduling to increase the 
number of instructions executed per cycle. These processors maintain a fixed- 
size window into the instruction stream, analyzing the instructions in the window 
and executing them out of order so as to improve performance. However, it is 
typically required that the results of this out of order execution be the same as 
that of a sequential execution of the program. Proving this correlation is non- 
trivial: The out of order scheduling algorithm is often complex, and may use a 
variety of data-structures not used in the sequential algorithm. 

The two prevalent methods for the formal verification of hardware designs are 
model checkinq (e.g. [ABHS99,BCRZ99,JM011) and deductive verification (e.g. 
[SH98,HGS00,CGZ96]). 

There are obvious advantages to the model-checking techniques, the most 
important being that it is fully automatic and requires no strong familiarity with 
the internal details of the design. A very serious limitation of model-checking 
techniques is the limited size of designs which can be fully automatically verified. 

The alternative approach based on deductive verification suffers from no such 
limitations and, in principle, can be used to verify very big designs provided their 
structure is based on regular patterns. The main drawback of the deductive 
approach to reactive system verification (as outlined, for example, in [MP95]) is 
that it is not fully automatic and requires much user ingenuity and supervision. 

* Research supported in part by the John von Neumann Minerva Center for Verifica- 
tion of Reactive Systems, and the European Community 1ST project “Omega”. 
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In the past several years we have considered several out-of-order execu- 
tion designs, developing the predicted values method for proving correctness 
using refinement. This method was applied to Tomasulo / Pentium Il-like mod- 
els [AP99,AP00] and to a much simplified Mips model [APOl]. None of these 
models included memory operations, and progress was not proved. The signif- 
icant amount of human interaction required was a limiting factor in the com- 
plexity of the model, and the types of properties, we could verify. 

For this reason, we developed tlpvs [PA03], a system for the formal veri- 
fication of linear temporal logic (ltl) properties built on the PVS [OSRSCOl] 
verification system. Using tlpvs we were able to apply the predicted values 
method to a significantly more complex model, based on the Mips RIOOOO. In 
verifying what is arguably the most elaborate out-of-order execution mechanism 
verified within the academic community to date, we demonstrate the generality 
of our methods with respect to different execution models, and the feasibility of 
their use on more complex models. 

Furthermore, tlpvs allows us to use fairness conditions to prove progress 
properties, checking for a class of errors likely to be missed otherwise. Most 
abstraction based verification methods for verifying out-of-order execution algo- 
rithms prove that there is an abstract state matching every concrete one, but do 
not ensure that the fairness requirements in the abstract system are met. They 
are therefore unlikely to detect a livelock or deadlock situation. We demonstrate 
progress in the concrete system by refinement to a weakly fair abstract system. 

This paper makes a number of contributions: It is the first report of the suc- 
cessful verification of a detailed model based on the Mips architecture. In it we 
demonstrate that the predicted values method is general enough to be used on 
different architectures and extend its use to systems with memory. Furthermore, 
we extend the verification method to include progress, thus verifying that dead- 
lock does not occur. We also demonstrate the use and advantages of tlpvs in 
microprocessor verification. 

In the next section we discuss related works. Section 3 describes our model, 
MIPS, Section 4 explains the use of predicted values and Section 5 overviews 
TLPVS. In Section 6 we describe our proof of correctness. Space constraints limit 
the detail in this paper; annotated PVS files are available at [Tlpvs]. 

2 Related Works 

Out-of-order execution mechanisms have been a very popular area of research 
in the last few years. Quite a number of techniques have been developed and 
applied to a variety of models. However, the models used and simplifications 
made are not standardized, making comparisons difficult. 

The method of completion functions proposed by Hosabettu et al [HGSOO] is, 
like ours, purely deductive, and uses PVS. Completion functions are used to com- 
plete every unfinished instruction in the implementation system which can then 
be compared with the specification system. Whereas the completion functions 
recursively compute the future value of the instruction, we use predicted values 
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to obtain the same value without flushing, and without constructing completion 
functions. We believe that in the examples we have examined predicted values 
are easier to calculate and support than completion functions. 

Whereas completion functions can be seen as implicit flushing mechanisms, 
explicit flushing mechanisms have also been used in out-of-order verification. In 
Burch and Dill’s seminal paper [BD94] a pipelined systems is verified using refine- 
ment and flushing. After flushing the implementation state it is compared with 
the specification model. However, this approach does not work for out-of-order 
architectures as flushing the large buffers of partially completed instructions is 
too complex [SJD98a]. In order to verify out-of-order scheduling an incremental 
flushing mechanism [SJD98a] was proposed, as well as induction [SJD98b]. 

More recently, Lahiri and Bryant [LB03] used shadow variables, as well as 
refinement maps, to verify various processor models using UCLID, a deductive 
system based on the logic of CLU. Their shadow variables are taken directly 
from the abstract system - it is verified that the values computed in the con- 
crete state match these auxiliary values. These auxiliary variables are similar to 
our predicted value fields, however conceptually our proof of the correctness of 
prediction is independent of the abstract system, while this proof is dependent 
on both. They also verify models including superscalar dispatch and retirement 
(though not, apparently, in conjunction with speculation / memory operations.) 

Jhala and McMillan [JMOl] use refinement maps to modelcheck out-of-order 
execution systems. Like [LB03] they use auxiliary variables calculated by the 
abstract system. This proof has a fundamentally deductive flavor although in- 
variants are proved using model checking. This has the advantage of increased 
automation. However, the amount of user understanding needed to construct the 
correct refinement maps, and use the co-induction and abstraction mechanisms 
is, in our experience, far from negligible. Furthermore, this method relies heavily 
on symmetry, and it is unclear how it would work in asymmetric systems. 

In [SH98] a model including speculation, memory operations, external in- 
terrupts and precise exceptions is deductively verified. An intermediate model 
comprising a table of history variables (MAETT) is used to verify the system 
in ACL2. The entire system state is stored in this table at dispatch time, and 
removed if a flush occurs. Like them, we also use auxiliary variable to allow for 
roll-backs in the case of a mispredicted branch, however we save only memory 
values, and only when branches are executed. Far more, and more complex, aux- 
iliary variables are used than in our proofs, but none are used to ‘predict’ future 
values. The model is impressively detailed, but the proof has the disadvantage 
of being specific to one configuration and limited to bounded resources. 

Our fully parameterized model, mips, is not restricted to bounded resources, 
and we do not explicitly use symmetry in our proofs. However, we exclude ex- 
ceptions, a feature included in most of the models mentioned above. In the past 
we have used predicted values to verify systems with exceptions [APOO], and 
believe that there is no difficulty, in principle, in doing so in this case also. 

While other researchers chose to extend their models and thus demonstrate 
how their methods scale up in the face of increasing complexity, we chose to test 
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Fig. 1. Data structures for mips. Shading indicates auxiliary variables. 



the flexibility of our methods by trying them on a totally different out-of-order 
execution mechanism - that of the mips model. To the best of our knowledge no 
other method has yet been used to verify this out-of-order execution mechanism. 
Its varied data structures and complex control algorithm make it more challeng- 
ing than our Tomasulo-based models; we believe that it is the most complex 
model verified in the academic community to date. Furthermore, this is the first 
out-of-order processor verification effort which includes a proof of progress - a 
necessary feature which appears to have been universally overlooked. 



3 A Model Based on the Mips RIOOOO 

In this section we detail our model, mips, based on [Yea96,Gwe94]. We have tried 
to make our model as accurate as possible, but made some simplifications, and 
took some assumptions in cases where the exact implementation was unclear 
to us. The more significant of these are explicitly noted. Our model includes 
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arithmetic and memory operation, load forwarding, branch prediction and spec- 
ulation, but not exceptions. The two most prominent differences between the 
Mips RIOOOO algorithm and the Tomasulo-like algorithms on which models are 
generally based are the register renaming and branch verification strategies. 

Register renaming is effected by clearly distinguishing the logical register 
numbers, referenced within instruction fields, from the physical registers, which 
are locations in the hardware register file (RF). A register map table (RMT) 
maps each logical register to a physical register in RF. Other registers in RF 
may be used to hold prior values of logical registers. A busy-bit table (BB) 
indicates for each physical register whether it contains a valid value. The free 
list (FL) records the list of physical registers not currently in use. 

The processor verifies branch prediction as soon as the branch condition 
is determined, even if earlier branches are still pending. If the prediction was 
incorrect the processor immediately aborts all instructions fetched along the 
mispredicted path, and restores its state to that before the misprediction. Every 
dispatched branch is stored in a branch stack (BS). In addition to the alternate 
branch address pc (program counter), the BS stores a copy of the current RMT, 
a pred bit indicating whether the branch was predicted taken, and the address 
queue write pointer position, aq-wrPtr^ . This is significantly more complex than 
flushing the entire re-order buffer when retiring a mispredicted branch - the 
mechanism typically used in verification models. 

The active list (AL) is a circular buffer which maintains program order be- 
tween all dispatched but not yet retired instructions. (It is functionally compara- 
ble to the re-order buffer in Tomasulo-type models.) Arithmetic and branching 
instructions awaiting execution occupy entries in the integer queue (IQ). Simi- 
larly, memory instructions occupy entries in the address queue (AQ)^. We model 
memory as a mapping from address to value, both of which are undefined. 

The active list and address queue are both ordered circular buffers, with write 
pointers pointing to the next free entry, and read pointers noting the oldest queue 
entry. The integer queue and branch stack are unordered. 

During dispatch an instruction is allocated the next entry in the active 
list. In addition, instructions are allocated entries in the IQ or AQ depending 
on their type. The current physical registers for operands are looked up in the 
RMT, their availability checked in the BB, and the information stored in the 
IQ (AQ). The IQ {AQ) entry also contains a dest field indicating the physical 
register in which the result will be stored, a tag al pointing to the active list 
entry associated with this instruction and any immediate value (im) encoded in 
the instruction. Address queue entries also have a done field recording whether 
the address has been calculated, and a field storing the address. A branch mask 
bmchjmask is calculated for new entries in the IQ by noting which entries of the 
BS are currently occupied. Intuitively, this mask will later be used to determine 



^ It appears that the write pointer and a prediction bit are stored, but this is not 
explicitly stated in the literature. 

^ The Mips RIOOO also has a floating-point queue, very similar to the integer queue, 
which we have not modeled. 
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whether this instruction is along the path of a mispredicted branch and should 
be flushed. 

If the instruction has a target register, t, it is also allocated a free physical 
register from the FL, and the RMT is updated accordingly. The identity of the 
previous physical register to which t was mapped is stored as old-reg in the AL. 

Branch instructions are predicted, and the relevant information is stored in 
a new entry of the branch stack. The index of the BS entry is stored in the AL 
as the bid (branch id)^. 

Address calculation can be performed when all operand values are avail- 
able. Using operand values obtained from the register file, the address is calcu- 
lated and stored in the AQ. A done bit in the AQ is set after address calculation. 

Load execution occurs after the load address has been calculated and the 
address of all preceding stores in the AQ have also been calculated. If any of 
these stores is to the same address as the load, then the value of the most recent 
preceding store is forwarded to the load. Otherwise, the value is obtained from 
the memory. The new value is written to the target register in the RF, the BB 
is updated, and the done bit in the AL entry is set. 

Arithmetic instruction execution is enabled when all operands in the 
IQ are ready. The calculated value is written back to the RF, and the BB. The 
AL entry is updated, as are all matching operand fields in the AQ and IQ. The 
IQ entry is freed. If the instruction is a correctly predicted branch, then the BS 
entry is freed, and the corresponding branch mask bit is reset in all IQ entries. If 
the instruction is a mispredicted branch, then the program counter {pc) is set to 
the alternative value and all instructions succeeding are flushed: All IQ entries 
whose branch mask matches this branch are freed; the write pointer of the AL 
is set to the entry succeeding the branch (effectively freeing all entries after it); 
the RMT is restored to the values stored in the BS, i.e. all mappings due to 
instructions succeeding the branch are undone; physical registers allocated to 
instructions along the mispredicted path are returned to the free list; the write 
pointer of the AQ is restored to its position when the branch was dispatched. 

The instruction at the head of the AL can be retired if its done bit is set. 
The physical register stored in its old-reg held is freed and added to the free list. 
Thus, a physical register is freed only when the next instruction targeting the 
same logical register is retired. The value in the old_reg register may be needed 
as an operand value for instructions dispatched after it but before the next 
instruction targeting the same logical register. The retirement policy ensures 
that register values are available as long as there is a possibility that they will 
be needed. If the instruction is a load or store, the entry at the head of the 
address queue (which must match this instruction) is also freed. In the case of a 
store memory is updated with the store value. 

To this system we add a number of auxiliary variables (shaded in Fig. 1). 
Some auxiliary fields duplicate information in a more accessible form (the aqAq? 
field of A L records whether it is an arithmetic or a memory instruction) or 

® It is likely that the AL does not really have a bid field, and another mechanism, 
which we could not determine, relates BS entries to those in the AL and IQ. 
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provide pointers to related data structures (the slot fields in the active list points 
to the instruction’s IQ or AQ entry). The auxiliary BS .pvmem field allows us to 
roll-back memory in the case of a mispredicted branch (Section 4.1). The system 
also contains many auxiliary predicted value fields, the use of which is explained 
in the next section. 

4 Out-of-Order Execution and Predicted Values 

Verification of out-of-order (000) systems often makes use of refinement. That 
is, the concrete, out-of-order implementation design is compared to an an abstract 
system specifying all the acceptable correct computations. Typically, the abstract 
system is taken to be a sequential system in which each instruction is completed 
(issued, executed, retired) in one step. 

A difficulty with this comparison is that in the 000 execution systems the 
register file and memory are updated many cycles after the instructions are is- 
sued. Thus the program counter in the two systems may match if we synchronize 
at dispatch time, but the register file only if we synchronize at retirement time. 

We developed the predicted values approach to deal with this disparity. Pre- 
dicted values are auxiliary fields mapped to some or all of the value fields in the 
system, predicting the eventual value. This prediction is fully deterministic and 
depends only on other values in the system at the time of its calculation. Using 
the terminology of [AL91] these are history, not prophecy, variables. Predicted 
values form a “shadow” system mirroring the “real” computation except that 
all computations occur at dispatch time, using predicted values for operands. 
Our method consists of two verification steps: 

1. Correctness of prediction: We determine conditions under which values and 
predicted values in the 000 system agree. These are ordinary, single system 
invariants. 

2. Refinement: We use refinement to prove a relationship between the predicted 
values in the 000 system, and values in the sequential system. The systems 
are synchronized at dispatch time. 

This method has the advantage of allowing us to relatively easily synchronize 
the out-of-order and sequential systems at dispatch time, proving refinement 
without costly flushing or complicated roll-back mechanisms. 

4.1 Using Predicted Values in mips 

In the initial state all predicted value fields are equal to the real values. When the 
first instruction is dispatched its predicted value is calculated by applying the 
instruction operation to the predicted values of its operands. We consider, first, 
an arithmetic operation. When it is dispatched its predicted value is calculated 
from the predicted value of its operands in the RF, and stored in the IQ entry 
and as the predicted value of its target register in the RF. 

Similarly, loads and stores calculate their predicted addresses {pvaddr) by 
using predicted values in the RF . The predicted value of a load is the predicted 
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value at the pvaddr location in memory. Stores update the predicted value of 
the pvaddr location in the memory. 

When branches are dispatched, the system speculates whether or not to take 
the branch using value fields in the system, a decision noted in the BS.pred field. 
(Note that this is not a predicted value - it is a system based branch prediction 
that may be incorrect. It records whether the branch is speculatively taken.) 
Using predicted value fields from the RF for the source operands, the branch 
condition is evaluated, determining whether it should be taken. This value is 
stored in the pv field if the IQ. The pred-ok field records whether the branch is 
mispredicted (by comparing BS .pred with IQ.pv). In addition, a complete copy 
of the memory predicted values is stored in the BS .pvmem field. 

When a mispredicted branch is verified and flushed, it is necessary to restore 
the RMT and the mem to the state they were in before the misprediction. 
Instructions along the mispredicted path may have updated the RMT and these 
updates are undone by copying the BS.rmt field to the RMT . The real values 
in memory will not have been affected by these instructions as stores update 
memory only when they are retired. However, we do update the predicted values 
in memory when stores are dispatched. Analogously to the treatment of the 
RMT, we save the predicted memory values as BS .pvmem when a branch is 
dispatched, and restore them if it is verified as being mispredicted. 

Predicted values are neither read nor written during address calculation, 
instruction execution (excluding misprediction verification) and retirement. 

5 A Brief Overview of tlpvs 

In order to reduce the enormous manual effort in conducting deductive proofs, 
we developed tlpvs which includes a formal PVS specification of ltl based 
on [MP95] and a framework for defining systems. A number of rules for proving 
safety and response properties are included in the system, each one accompanied 
by a strategy supporting its use. These rules and strategies greatly reduce the 
routine theorem proving interaction. 

All proof rules used are defined and proved correct within tlpvs. In doing 
so we eliminate the pen-and-paper application of “known” rules typical in many 
proofs, and the validity of our final proof rests solely on the correctness of PVS. 

5.1 Parameterized Fair Systems 

The computational model of parameterized fair systems [PA03] is used for defin- 
ing systems in tlpvs. This is a variation of the fair discrete systems of [KPOO] 
which, in turn is derived from the model of fair transition systems [MP95]. 

A parameterized fair system (pfs) S = (U, 0, p, T , J , C) consists of 

— U : A finite set of typed system variables. We define a state s to be a type- 
consistent interpretation of V. A (state) predicate is a function which maps 
states into truth values. A bi-predicate defines a binary relation over states. 

— 0 : The initial condition. A predicate characterizing the initial states. 

— p: The transition relation. A bi-predicate relating a state to its successor. 
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— JF: A non-empty fairness domain. This is a domain which is used to param- 
eterize the fairness requirements of justice and compassion. 

— J : The justice (weak fairness) requirement. This is a mapping from T to 
predicates (^J '. \T ^ predicate]). For every t S iF, a computation must 
contain infinitely many f7[t]-states. 

— C : The compassion (strong fairness) requirement. These were not needed in 
this proof. See [PA03] for details. 

A run of a pfs is an infinite sequence of states satisfying the requirements of 
initiality and consecution. A computation is a run satisfying the justice and 
compassion requirements. 

Our definitions of ltl are taken from [MP95], and for brevity are omitted. 

6 A Proof of the Correctness of mips 

In this section we use refinement to prove that every execution of mips has a 
matching sequential execution, thus proving the safety property that mips com- 
putes values correctly. Thereafter we prove progress by proving that a matching 
computation of the sequential system can be found, in which infinitely many 
non-idling steps are taken. 

More precisely, we prove that our concrete system, mips: 

T C^), refines an abstract system SEQ: 6>^, IF^, C^) 

in which each instruction is completed in a single step. Since both mips and 
SEQ include a program counter (pc), memory (mem), and register file (RF), we 
subscript mips instances with “C”, and SEQ instances with “A”. 

Let and denote the sets of concrete and abstract states respectively. 
Let 17, referred to as the domain of observations, denote a set of elements. 
Let Q and i-^- 17 be two functions termed the abstract 

and concrete observation functions, respectively. They indicate the parts of the 
systems compared in the refinement relations. We define an interpolating system 

s, = {V, = v^\jv,,e^he\,p^hp\,T^,j^,c^) 

where j k)) j and 6>* (P^ , P^) may refer to all variables in P^ U P^ . Func- 

tions 6>* and p* allow us to “choose” from the possible transitions of SEQ, one 
that correctly matches the mips transition. We denote the P^ (P^) component 
of Vj by Vj])c respectively). 

Within TLPVS we define, and prove the validity of, rule REF (Fig. 2). Intu- 
itively, R1 and R2 together ensure that the component of a run of S, is a run 
of S^. Premise R1 requires that an initial S'^-state matching the initial S^-state 
can be found. Premise R2 requires that an S'^-state satisfying p* can always be 
found, ensuring that an 5^-successor state can be built. Premise R3 asserts that 
throughout the computation the observation functions of the two systems agree. 
The conclusion, that refines S^, denoted Q S^, is formalized as: 



(1) For every computation seq^ of S^, there is a run seq^of 
such that at every time t, 0^(seq^(t)) = 0^(seqj,(t)). 
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Formula 6** initializes the pc, mem, and RF in with the same values as in 
. Transition relation p* defines an instruction execution whenever defines 
a dispatch which is not along a mispredicted path. All other -transitions cause 
p* to idle. Premises R1 and R2 are trivial to verify. We discuss the proof of R3: 
We define as the tuple (pc^, mem a, RF a)- 

If the current state of mips does not include a mispredicted branch (detected 
by checking the auxiliary pred-ok variable in BS) then is defined as 

{pcQ, Xa : memc-pv{a), Xr : RF c{RMT{r)).pv) 

That is, we take the current program counter and the predicted values for mem- 
ory. We use the predicted value of each logical register r, obtained from the RF 
by using the RMT to identify the physical register to which r is mapped. 

Otherwise, is derived from the branch stack. Letting firstMis be the index 
of the first mispredicted branch, we define as 

{BS {first Mis). pc, BS{firstMis).pvmem, Xr : RFc{BS {first Mis). rmt{r)).pv) 

That is, we take the alternative branch address store in the BS, the copy of mem- 
ory predicted values stored when the branch was taken, and the predicted values 
for logical registers obtained using the BS.rmt mapping. Intuitively, BS .pvmem 
and BS.rmt record “snapshots” of the system before the misprediction occurred, 
and unlike RMT and mem, do not include changes made by instructions along 
the mispredicted path. 

Result (1) refers to auxiliary variables in mips. From it we derive 

(2) For every computation of S^, there is a run seq^of S^, 
such that at every time t at which seq^{t).AL is empty: 
pc A = pcc 

A RFa = Xr : RFc{RMT{r)).v 
A memA = Xa : memc{a).v 

by proving that predicted values and real values match under certain conditions. 

Verifying that R3 of ref holds required that a number of system properties 
be proved invariant. Similarly in deducing (2) from (1). 

We proved the invariance of 19 general properties if mips, such as that if 
slot S of the IQ is occupied, then its al field points to an occupied entry in the 
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AL, in which the done bit is false. We also prove the invariance of 7 properties 
relating directly to predicted value. Properties in this group include prediction 
correctness (under which conditions value and predicted values are guaranteed 
to agree) and relationships between predicted values in different structures. 

Most of the 26 properties whose invariance we proved, the bulk of the verifi- 
cation effort, fell into 1 of 4 groups of mutually inductive properties. We defined, 
and proved the correctness of, a simple compositional mutual induction rule 
which allowed us to verify each property in the group separately. 

6.1 Proving Progress 

The refinement proof described above is not very dissimilar from that of other 
researchers e.g. [LB03,JM01]. However, we claim that it is insufficient. There are 
errors that may not be found by the above method ~ errors which prevent the 
system from progressing. Consider, for example, a version of mips in which we do 
not return the old-reg register to the FL on retirement. Within a finite number 
of steps the system will flush itself out {AL empties) but be unable to dispatch 
the next instruction with a target as there are no “free” physical registers. The 
matching abstract run will be one with an infinite suffix of idling transitions. 

To prove progress we define the justice condition for SEQ as An : pc^ yf n. I.e., 
for every n G N’'", we require that infinitely often the program counter is not n. 
Intuitively, the sequential system can always progress, and doing so causes the 
program counter to change. However, it is possible for a program to violate this 
condition by containing a branch, at location m, with m as its branch target. 
We obviate this undesirable scenario by assuming that Vn : prog{n).brtgt yf n. 
We now try to prove that 

(3) For any computation seq^ of S^, there is a computation seg^of S'^, 
in which infinitely many non-idling steps are taken, 
such that at every time t 0^{seq^{t)) = Oj^{seq^{t)) and 
if seg„(t).HL is empty then: pc a = pcr 

A RFa = Ar : RFc{RMT{r)).v 
A memA = Aa : memc{a).v 

That is, we try to show that every computation of mips matches a fair run of 
SEQ in which progress is made (infinitely many instructions are completed). It 
is easy to derive (3) from (I) and (2) once we prove the response property 

(4) MIPS ^ Vn : U[[getPC = n) — >0{getPC yf n)) 

where getPC is defined as pc^ if the system contains no mispredicted branches, 
BS {first Mis). pc otherwise. 

We use a derivation of the well rule of [SPBAOO] (Fig. 3) to verify (4). 

The justice requirements we define for mips are that every sub-transition 
(dispatch, address calculation, execution, retirement) which is enabled infinitely 
often is eventually taken. 

We define a ranking function which decreases every time a partially executed 
instruction which is not along a mispredicted path progresses (e.g. its address is 
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Rule WELL 

For PFS S = (V, O , p, T , J, C) , 

Given initial and goal predicates p,q, helpful predicates {ht : t € fA}, 
a well founded relation ^ over A, and ranking functions 5 : S A 

Wl. p ^ q W \J ht 

teJ- 

W2. Vt G fP' : ht A p ^ q y \J (5 > S' A h'„) V {h't A S = S' A ^j'[t]) 

ueJ- 



a{p — tOq) 

Fig. 3. Rule well. Primed variables refer to values in the next state. 

calculated, or the instruction is retired). Instructions along mispredicted paths 
do not effect the rank. When an instruction which is not along a mispredicted 
path is dispatched (a goal dispatch), the pc changes and the goal state is reached. 

The helpful predicate requires that if AL is empty then an instruction be 
dispatched, otherwise that the instruction at the head of A L progress. To prove 
that always holds for some u, we show that the instruction at the head of 
the AL (if any) can always progress, and that a dispatch is always enabled if 
the AL is empty. The latter required us to prove new safety properties regarding 
resource recovery, properties unnecessary for the proof of (2). 

Justice conditions ensure that instruction at the head of the queue progresses, 
and thus the rank decreases. Well-foundedness ensures that as long as no goal 
dispatch occurs, the rank continuously decreases until the AL is empty. At this 
point justice conditions ensure that a goal dispatch occurs. 

7 Conclusion 

In this paper we present the predicted values method for the verification of 
out-of-order execution. Using predicted values we are able to prove refinement 
between a complex out-of-order execution system and a simple sequential system. 

Our ability to verify a model of the size and complexity of mips is in no 
small part thanks to the use of tlpvs, which eliminates part of the drudge work. 
Despite this, our proofs are by no means automatic, and the effort required (two 
to three person months) is significant. However they are fully automated, with 
every rule being proved within the PVS theorem prover (as part of tlpvs). We 
have not sufficed at using a “known” refinement rule (or any other rule), but have 
verified the rule as well. In totally eliminating the pen-and-paper element, we 
believe that we provide a higher degree of certainty than most previous proofs. 
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Abstract. Sequential consistency is the archetypal correctness condition for the 
memory protocols of shared-memory multiprocessors. Typically, such protocols 
are parameterized by the number of processors, the number of addresses, and the 
number of distinguishable data values, and typically, automatic protocol verifica- 
tion analyzes only concrete instances of the protocol with small values (generally 
< 3) for the protocol parameters. This paper presents a fully automatic method for 
proving the sequential consistency of an entire parameterized family of protocols, 
with the number of processors fixed, but the number of addresses and data val- 
ues being unbounded parameters. Using some practical, reasonable assumptions 
(data independence, processor symmetry, location symmetry, simple store order- 
ing, some syntactic restrictions), the method automatically generates a finite-state 
abstract protocol from the parameterized protocol description; proving sequential 
consistency of the abstract model, via known methods, guarantees sequential con- 
sistency of the entire protocol family. The method is sound, but incomplete, but 
we argue that it is likely to apply to most real protocols. We present experimental 
results showing the effectiveness of our method on parameterized versions of the 
Piranha shared memory protocol and an extended version of a directory protocol 
from the University of Wisconsin Multifacet Project. 



1 Introduction 

Shared-memory multiprocessors are the dominant form of multiprocessing. In such sys- 
tems, the processors share a single address space and interact by reading/writing to a 
shared memory system. A memory model is the correctness condition for the memory 
system, defining the processor- visible behavior of the system. 

Sequential Consistency (SC) [20] is the archetypal memory model. Informally, SC 
states that every execution of the system must behave as if it were some interleav- 
ing of the individual processors’ executions on a single atomic memory. For example. 
Fig. 1 shows an execution that is not sequentially consistent, because there is no way 
to interleave the executions of the two processors on a single memory and obtain the 
observed results. SC continues to be important both from a verification perspective 
as a well-defined and extensively-researched challenge problem, as well as from an 
implementation perspective as a memory model balancing ease-of-programming with 
implementation flexibility [17]. 

* This work was supported in part by a research grant and a graduate fellowship from the Natural 
Science and Engineering Research Council of Canada. 
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proc\ : (write addr2 val\), (write addr\ val2), (read addr2 val\), (read addv2 vfl/2) 
proc2 '■ (write adrfri val\), (write addr2 val2), (read addri val\), (read addri vfl/2) 

Fig. 1. Example execution that is not sequentially consistent. The values seen by the two reads on 
one processor imply that the other processor’s second write appears to have occurred between the 
two reads; no interleaving can satisfy this property for both processors simultaneously. Note that 
the operations on each address, considered in isolation, are sequentially consistent, demonstrating 
that sequential consistency cannot be verified on a per-address basis. (In fact, the per-address 
operations satisfy the even stronger notion of simple sequential consistency, defined in Sec. 2 .). 



Memory systems use intricate finite-state protocols to implement the desired mem- 
ory model. These protocols are notoriously complex and error-prone, because the pri- 
mary objective is performance rather than simplicity. With the ascendance of finite-state 
model checking [9] as an automatic verification method, the verification of multiproces- 
sor memory system protocols has been a major success story of the practical application 
of formal verification. 

Finite-state model checking is limited, of course, to finite-state systems. In real- 
ity, memory system protocols are defined as parameterized systems — typically by 
the number of processors, the number of addresses, and the number of distinguishable 
data values. Most automatic protocol verification efforts have therefore considered only 
concrete instances of protocols. Furthermore, because of problems with state-space ex- 
plosion, the instances verified are generally very small, e.g., for a detailed model of a 
typical industrial protocol, a successful model-checking run with 3 processors, 3 ad- 
dresses, and 3 data values is a remarkable achievement. Far better would be a method 
to automatically verify an entire parameterized protocol family. 

In theory, handling a parameterized number of processors is most interesting, be- 
cause shared memory protocols are intended to facilitate complex interactions among 
processors. In practice, however, handling parameterized numbers of addresses and data 
values is a higher priority, because real shared-memory multiprocessors have few pro- 
cessors and many addresses and data values. For example, typical configurations have 
2 to 8 processors, with even the largest installations having at most a few dozen pro- 
cessors. In contrast, even the smallest and most common configurations (e.g., a hyper- 
threading desktop PC) have at least 2^^ data values and hundreds of millions of physical 
addresses (with much larger virtual address spaces) — far beyond the reach of the direct 
application of model checking for the foreseeable future. 

This paper presents a fully automatic method for proving the sequential consistency 
of an infinite family of protocols parameterized in two dimensions: the number of ad- 
dresses, and the number of data values. We consider the number of processors to be a 
fixed constant. Our approach relies on data independence to handle the parameterized 
data values; our main contribution is a means to handle parameterized addresses. Note 
that unlike easier- to- verify properties like seriality or linearizability [16], sequential 
consistency cannot be verified on a per-address basis. (See Fig. 1.) No previous fully 
automatable method for verifying sequential consistency can be parameterized by both 
the number of addresses and the number of data values. (See Sec. 7 for related work.) 

Our method leverages a few common, practical assumptions about memory system 
protocols. Three of these — data independence, processor symmetry, and location sym- 
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metry — are standard and easily enforced syntactically. We impose some additional 
syntactic constraints to simplify the automatic generation of a finite-state abstract pro- 
tocol from the parameterized protocol description; these are described in Sec. 4. Finally, 
our method verifies a slightly stronger form of SC in which writes to an address cannot 
be reordered. To our knowledge, all implemented SC protocols implement this stronger 
form. (The canonical example of a protocol that is SC, but violates this assumption is 
Lazy Caching [1].) With these restrictions, our method is, in principle, fully automatic. 

2 Preliminaries 

A labelled transition system (LTS) is a tuple M = (5,2,7, — >), where 5 is a set of 
states, 2 is a finite alphabet, 7 C 5 is the set of initial states, and — >C 5 x 2 x 5 is the 
transition relation. The language L{M) of the LTS M is the subset of 2* defined in the 
usual way. Given an alphabet 2 and some string x, define x 1 2 to be the string obtained 
by deleting all symbols of x that are not in 2. We extend | to act on sets of strings in 
the obvious way. 

Denote by N the set of positive integers, and by the set { 1 , . . . , n}. For sets P, A, 
and y, \etMemEvents{P,A,V) be the set {7?,W} xPxAx y.ThenMem7ivente(N,N,N) 
is called the set of memory events, also denoted simply MemEvents. An occurrence 
of memory event (R,p,j,d) is meant to represent processor p reading value d from 
address j, we call such an event a read. Similarly the write event (W,p,j,d) indicates 
processor p writing value d to address j. We call a finite string over MemEvents a 
trace. A shared memory protocol, hereafter simply protocol, is formalized as a LTS 
P = (5,Mem7?venti(N„,Nm,Nv) U7i,7, — >) for some n,m,v > 1, where S is finite and 
E (the silent action labels) is disjoint from MemEvents. The quantities n, m, and v are 
respectively denoted Procs{P), Addrs{P), and Vals{P). Intuitively, these quantities are 
respectively the number of processors, number of memory addresses, and number of 
data values (per address) processed by the protocol. We define PROPS to be the set of 
all protocols. For a protocol P, define traces{P) to be L{P) ] MemEvents. 

We aim to verify correctness of an infinite set of related protocols called a protocol 
family, defined as follows. For fixed n > 1, an n-processor family is a function T : 
N X N ^ PROPS where for all m,v > 1, Procs{J'{m,v)) = n, Addrs{3‘{m,v)) = m, and 
Vals{3^{m,v)) = v. 

A trace is said to be serial if every read event has the same value as the last write 
to the same address, and such a write always exists ^ A trace o is said to be sequen- 
tially consistent (SC) iff there exists a trace o' (of the same length) such that (1) 
o' t Mem7?venti({p} ,N,N) = O t47em7iven^^({p} ,N,N) for each p> I, and (2) o' 
is serial. We call such a o' a serial reordering of o. Furthermore, o is said to be simple 
SC (SSC) if there exists o' with the above two properties plus the additional property 
(3) o' t ({W} X N X {y} X N) = O t ({W} x N x {y} x N) for each y > 1; here o' is 
called a simple serial reordering. Intuitively, SC says that there must exist a reordering 
that is serial and preserves the per-processor order. SSC adds the requirement that the 
ordering of writes to each address is also preserved in the reordering. A protocol is said 

^ The last write to the same address is the rightmost write that is left of the read in question and 
has the same address, if the trace were written out from left to right as usual. 
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to be serial, SSC, or SC, if all of its traces are serial, SSC, or SC respectively. Similarly, 
an n-processor family is said to have any of these properties if all of its constituent 
protocols have the respective property. 

3 The Big Picture 

Our aspiration is to algorithmically verify that an n-processor family 3^ is SSC. This 
section presents Theorem 1, which allows us to soundly reduce the proof that T is SSC 
to checking SSC of a protocol Q, where Addrs{Q) — Pwcs{Q) = n, and Vals{Q) = 3, 
provided that an infinite number of projected trace containments hold between certain 
members of T and Q (see condition 2 of Theorem 1). However, in Sec. 5 we show that 
if T is expressed in a certain formalism, then we can effectively produce a Q for which 
these containments hold “by construction”. SSC of Q can be checked algorithmically 
using known methods based on model-checking [10, 29, 8, 7]. 

In Sec. 3. 1 we define three assumptions that are required by Theorem 1 ; the theorem 
itself is presented in Sec. 3.2. 

3.1 Assumptions 

Here we define three common protocol assumptions: location symmetry (LS), processor 
symmetry (PS), and data independence (DI). 

For a permutation X on N define fo be the function on MemEvents{n,m,v) 
specified by = (op,X{p),j,d). Similarly, define = 

(op,p,X{j),d). We extend and to have domain and range MemEvents* in 
the obvious way. A protocol tP is location symmetric (LS) if for every permutation X : 
^Addrs{T) ^Addrs{T) wc have o G traces(fP) implies X"‘^^''(o) G traces(fP). Similarly, fP 
is processor symmetric (PS) if for every permutation X : f^procs{T) ^Procs{T) we have 
a G traces(‘P) implies XP''°‘^(g) G traces(‘P). A family is said to be location symmetric 
or processor symmetric if all protocols in its image have the respective property. 

Intuitively, data independence (DI) in a system means that variables of a certain 
type can only be nondeterministically assigned, copied, and outputted [32]. When the 
system is a protocol, and the type is data values, one can define how DI manifests at 
the trace level. Qadeer [29] gives the following trace level definition of DI. A trace is 
called unambiguous if it has the feature that no two writes to the same address write 
the same value. Given m,v, v' > 1, we call a function X : Nm x — > Ny a renaming 

function, and define X}'°^((op,p,j,d)) = (op,p,j,X(j,d)). We extend X™^ to traces in 
the obvious way. Then an «-processor family T is said to be DI if for all m, v > 1 and 
traces o, we have o G traces(fp(m, v) ) if and only if there is v' > 1 , an unambiguous trace 
o' G traces(3^{m, v')), and a renaming function X : Nm x Ny^ ^ Ny such that o = X'"'^(o'). 

3.2 Reduction to Finite-State SSC 

Theorem 1. Let T be an n-processor family that is processor symmetric, location sym- 
metric, and data independent. If there exists a protocol Q such that 

1. Q is SSC, and 

2. For all m> n we have that traces(‘J(m, 3)) t3Tem£’ve«te(N„,N„,N3) C traces(Q) 
then fp(m, v) is SSC for allv> 1 and m> n. 
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Proof: The detailed proof is available from: 

http ; / /www . cs . ubc . ca/~ jbingham/bchqz 04 -proof s . pdf 
The proof relies on machinery developed in [29] to show that, under the LS, PS, and DI 
assumptions, if there exists a G tmces(3^(m,v)) such that a is not SSC, we can detect 
that a is not SSC by considering only a ^ MemEvents{N„,A,Nv), where A C N has 
cardinality n. By exploiting the symmetry assumptions, we can ensure that such a a 
exists for which A = N„. Furthermore, we can ensure such a a exists using only 3 data 
values. Hence, if Q is SSC, and the projected traces of all family members are contained 
in tmces{Q), then all family members must be SSC. D 

We note that Theorem 1 does not allow concluding that T(m,v) is SSC for 1 < 
m<n. This is not a deficiency, since we may verify correctness of these members via 
a finite number of model checks. Practically, these are the uninteresting cases, since 
multiprocessors always have many more addresses than processors. 

Suppose we are given T and wish to determine if a candidate protocol Q exists for 
which the two conditions of Theorem 1 hold. Our approach involves automatically con- 
structing a candidate Q such that condition 2 is guaranteed to hold; this construction 
involves abstracting T in some sense. The automatic construction of Q is possible be- 
cause of the formalism we use to describe T, presented in Sec. 4. Condition 1 is then 
checked algorithmically using known methods. If this check is successful, the conclu- 
sion of Theorem 1 follows. Otherwise, the approach has failed, and we can draw no 
conclusions; in other words the approach is sound but incomplete. Hence, to argue for 
the applicability of this approach, one must argue that real protocol families 

1. adhere to the LS, PS, and DI assumptions, and 

2. can be expressed in our formalism of Sec. 4, and 

3. won’t yield false negatives, i.e. if the family is SSC then the approach succeeds. 

It is widely accepted that real protocols satisfy property 1 [29]. Our formalism of Sec. 4 
is quite general, encompassing all real protocols that we have encountered. For instance, 
all of the protocols [1,5,22,6, 19, 14] are expressible in our formalism. In support of 
item 3, we present successful experiments on two challenging protocols in Sec. 6. 

4 A Protocol Description Formalism 

To describe an automatic construction of a candidate finite-state protocol Q from a pa- 
rameterized protocol description, we must choose some sort of protocol description 
formalism. Here, we will assume that the «-processor family T is expressible in a very 
general syntax based on first order logic that is inspired by the bounded-data parame- 
terized systems of [27]. We have tuned our formalism to provide enough expressiveness 
for the real protocol descriptions we have encountered, while still allowing the efficient 
and automatic generation of a sufficiently finely abstracted protocol Q. For the sake of 
perspicuity, we will treat the number of data values as being fixed af 3, since condition 
2 of Theorem 1 considers such family members. Standard restrictions can be imposed 
on our formalism to ensure that the family is PS, LS, and DI. For instance, DI can easily 
be enforced by syntactic constraints as observed by Wolper and others [32, 25, 29], and 
symmetric types such as Mur(p scalarsets [18] can be used to ensure PS. LS is inherent 
to the syntax [27]. 
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4.1 Syntax 

We assume three sets of variables X = {xi, . . . ,X|x| }, Y = {yi, . . . ,y|y| }, and Z = 
{zi, . . . ,Z|z| }. For a set D, let Z[D] denote the variable set {zi[d] | 1 < i < |Z| AJ € D} 
and let Vars(D) = X U F UZ[Z)]. Priming any of these sets has the effect of priming all 
constituent variables; semantically, primed variables will represent the next state. The 
variables of X are Booleans, while the variables in Z are arrays of Booleans indexed 
by addresses. The variables of Y will range over addresses, hence we call these vari- 
ables address ranged variables (ARVs). Of course non-Boolean hnite types and arrays 
of such can be encoded in this framework. In shared memory protocols, typical (though 
by no means exhaustive) examples of the variables in X, Y, and Z, are respectively helds 
corresponding to processor IDs or message types in messages, fields storing addresses 
in messages, and the permission bits and data value associated with each address in a 
local cache or main memory. 

We will employ auxiliary ARVs a and ai to quantify over, and let ARVars denote 
the set FU T' U {a,ai}. Dehne quantifier-free actions (QFA) as formulas with syntax: 

O ::= X I z[a] \ z[ai] \ (a = (3) | | (OVO) | (OAO) 

where x € V U A', z G Z U Z' and a, (3 G ARVars. For QFA (|), we write (|)(a) (resp. 
(|)(a,ai)) to emphasize that the set of auxiliary ARVs appearing in (|) is a subset of {a} 
(resp. subset of {a,ai}). We call upon a set of action labels Labels, of which we require 
{{R, VF} X N„ X N 3 ) C Labels. The transition relation of T must be expressible as a set: 

{ri{a) I I G Labels} 

where, for each £ G Labels, rg is a formula of the form 

rg{a)=(^g{a)A\/ai:\\)({a,ai). ( 1 ) 

Here, i^g{a) and \\tg{a,ai) are arbitrary QFAs and we call formulas of the form (1) 
restricted actions. The initial state predicate Init must be of the form: 

Init = Va : init{a) 

for some QFA init{a) that does not contain any primed variables. 

Intuitively, a transition in a protocol expressed in our formalism must satisfy rg{a) 
for some £ and some “distinguished” address a. <}i{a) dictates what happens to state 
related to address a, i.e. ath entries of arrays, while the conjunct Vai : \\fg{a,ai) dictates 
the uniform effect on all other addresses. This restriction accords exactly with what we 
have observed in real protocol descriptions. We find that real protocol transitions have 
the property that ARVs are referenced and/or modihed at no more than a single index 
a, with the exception that some more complex transitions will also modify entries at 
all other indices in some homogeneous way, hence the inclusion of Vai :\\fg{a,ai) in 
restricted actions. Usually, \\fg{a,ai) will simply state that if ^ (a = ai), then the aith 
entries of arrays are left fixed. However, several transitions in one of the protocols we 
experimented with have a more involved \\tg, e.g., a state change for one address forces 
all other addresses to abandon an optimization mode. A theoretical limitation of our 
formalism is that the auxiliary ARVs a and a\ are the only ARVs that can be used to 
index into arrays. However, typical instances of indexing using another ARV y G F can 
be performed by, for example, {y = a) A somejarraf \a\ . 
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4.2 Semantics 

In this section we formally define the set of LTSs represented by our family syntax. 

Let j be a valuation of the variables Vars(D) such that variables of X and Z[D] are 
assigned Boolean values, and variables of Y are assigned values of some type R, and let 
s' be a valuation to Vars(D)' with analogous typing. Then we call s&D, R-valuation and 
(s,s') a D, R-valuation pair, respectively. Intuitively, D is the index set for the arrays in 
Z, and R is the type of the variables in Y ; for the protocols of T these will be the same, 
but in Sec. 5 we construct a protocol for which they differ. For D, /^-valuation pair (i', s') 
and restricted action 9(a), we write 9(y) if 9 [i/a] is satisfied when variables 

are valuated by and quantified ARVs are taken to range over N^. If 9 is a QFA, 
we may omit the subscript on N . 

For each m > 1, our syntax defines a LTS T(m,3) = — >) where 

- 5 is the set of all Nm,Nm- valuations 

- L = Labels x N^. In a slight abuse, we will identify the memory event {op, p, j, v) G 

MemEvents with {{op,p,v),j) G L. 

- I = {s\ s\=m Init} 

- — > is the set of all tuples (s, {£,j),s') such that \=m i'e{j) and j G N^. 



5 A Candidate Q 

Here we define a candidate Q for Theorem 1 , which can be viewed as a modified version 
of T(n, 3) ; hereafter Q will refer to such. These modifications involve syntactic transfor- 
mations; the transition relation and initial state assertion can easily be realized automat- 
ically given {r(\£ G Labels} and Init. Intuitively, the modified protocol is a finite-state 
abstraction of the protocols in T(«,3), where everything related to addresses greater 
than n has been conservatively abstracted away. Note, however, that our abstraction is a 
finer abstraction than the typical abstract interpretation [1 1] in which addresses greater 
than n are replaced by an information-destroying T value that propagates throughout 
the interpretation. We need our more accurate abstraction to successfully verify real 
protocols. 

In Q, address-ranged variables have type N„ U {^}. The ^ symbol represents ad- 
dresses in N \ N«. Counter-intuitively, the arrays in Z will still be indexed by N„ be- 
cause the variables of Z[{^}] are existentially quantified out; for brevity we will let 
z [^] denote the variable list zi [^] , . . . , Z\z\ [^] , z) [^] , • • • j zj^j [^] throughout the paper. An 
important part of the transformation is the operator sub{-). For any action r, the action 
sub{r) is obtained by performing the following substitution: each occurrence of (a = (3) 
falling under an odd number of negations (where a, (3 G ARVars) is replaced with^ 

((a=(3)A^((a = ^)A(13 = ^))) 

^ This syntactic substitution gives the same effect on the transition relation as the usual abstract 
interpretation with ^ conservatively abstracting the values greater than n. (The existential quan- 
tification of the Z variables is the key difference between our abstraction and the typical one.) 
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For restricted action 9(a) and integer y € U we write (^,/) Ng 9(y) if 9[y/a] is 
satisfied when variables are assigned by and the universal quantifier in 9 is taken 

to range over N„ U 

We now define Q= {Sq,Lq,Iq, — >g): 

- Sq is the set of all (N„ U {^})-valuations. 

- Lq = Labels x (N„ U {^}). We note that labels of the form ((o,y>, v),^) are not in 
MemEvents. 

- 7g = |s I ^ Ng 3 : sub(Init)'^. The existential quantihcation notation 3z : / 

where z is a Boolean variable and / is a formula is simply syntactic sugar for 
/[tt/z] V/[ff/z]. 

- — >g is the set of all triples (^, (7,y),^') such that y G N„ U {^} and 

(j,/) Ng 3 z[^] : sub{rn{j)) (2) 

The existential Boolean quantification in (2) blows up the formula by a factor of 
2^l^l in the worst case. However, in practice we find that only a small number of the 
variables z[^] are actually mentioned in ry, which mitigates this effect. For example, the 
only dependencies on z[^] might be the 2 or 3 bits representing the access permissions 
at the ^th entry of a single processor’s cache. 

We conclude this section by asserting that the trace set of Q overapproximates that 
of T(m, 3) for all m> n, i.e. Q satisfies condition 2 of Theorem 1 . 

Theorem 2. lraces{3^{m,3)) tMcm£’ven^^(N„,N„,N 3 ) C lraces{Q) for all m> n. 

Proof: The detailed proof is available from: 

http ; / /www . cs . ubc . ca/~ jbingham/bchqz 04 -proof s . pdf 
The construction of Q naturally corresponds to an abstraction function mapping con- 
crete states from T(m,3) to abstract states in Q. This mapping turns out to be a weak 
simulation relation, and the projected trace containment follows. □ 



6 Experimental Results 

To evaluate our technique, we experimented with two protocols, which we call PIR and 
DIR. Both of these protocols are SSC but not serial, hence trace reordering requirements 
are nontrivial. PIR is a simplified abstraction of the Piranha protocol [5]. Our imple- 
mentation is consistent with the details of the simplification presented in [29]. DIR is a 
simplification of a directory based protocol with Scheurich’s optimization [39]. Our im- 
plementation is based on, but is simpler than, the description of Braun et al. [8], which 
was obtained from the University of Wisconsin Multifacet group. We explain part of 
the design of both protocols here and describe the ways in which our implementation 
of DIR is simpler than the description of Braun et al. [8]. 

In both PIR and DIR, each processor p has a cache that contains, for certain mem- 
ory locations y, a data value d(y>,y) and an access permission access(y>,y). The permis- 
sion may be modifiable (M), shared (S), or invalid (/). Processor p may read the data 
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value for location j if access (p, 7 ) is S or M, and may write (change) the data value if 
access( 7 >, 7 ') is M. 

In PIR, in addition to the caches, the system state has a queue per processor and an 
owner per memory location. The state maintains the invariant that the owner of location 
j is either some processor p for which access{p,j) = S 01 M, or the owner is null. 
Requests of a processor to change its access level or to get a data value for a location are 
not modeled explicitly. Rather, if owner(y) = p, in one transition (step) of the protocol 
the triple (d{p,j),j,X) may be placed on the queue of any processor p' ^ p, in which 
case an invalidate message (INV, j) is placed on the queues of any processor p" (other 
than p and p') with accsss{p" ,j) ^ I. Also, owner(j) is set to null and access(p,y) is 
set to INV. In a later transition, when triple (d,j,X) is at the head of p'’s queue, the 
triple may be removed from the queue, in which case d{p',j) is set to d, access(p',y) is 
set to M, and owner( 7 ') is set to p' . Also, if (INVJ) is at the head of the queue of p" , in 
one transition the message may be removed from the head of the queue, in which case 
access(p",y) is set to I. Other access permission changes are modeled similarly. 

The state space of PIR is relatively small because requests of processors are not 
modeled explicitly, and moreover, no directory is used to store the set of processors 
with shared access to a memory location. However, the use of queues ensures that the 
traces generated by PIR are interesting and realistic, in the sense that if in some trace, 
the Zth event O; is a read to location j which inherits its value from an earlier write 
event Ot, then arbitrarily many operations (indeed, arbitrarily many write operations to 
location j) may separate from O/ in the trace. 

The DIR protocol contains many low level details that are not modeled by PIR. In 
DIR, a directory maintains information on which processors have exclusive or shared 
access to a memory location. Each processor and the directory has both a request and 
a response input queue, with different uses. For example, invalidate messages to pro- 
cessors are placed on the request input queue, whereas messages containing data values 
are placed on the response input queue. 

In order to get exclusive access to a memory location, processor p sends a request 
to the directory, which arranges for the data value and the number of current sharers 
of the location to be sent to p. Before reading the data value, p must wait to receive 
acknowledgement messages from each current sharer. Because of the multiple queues, 
several race conditions can arise. For example, the directory might authorize processor 
p' to have exclusive access to location j after p, and p may receive a request to send 
j’s data value to p' before p has even received the data value itself. As a result, a 
processor has 9 permission levels per location in its cache in addition to the I, S, and 
M levels. For example, access(p, 7 ') may be IMS, indicating that p requested M access 
when access{p,j) was /, p has not yet transitioned to M access, but already another 
request for p to downgrade to S access has been received. 

To keep the state space within manageable proportions, our implementation of the 
DIR protocol does not model a certain queue of Braun et al. [ 8 ]. This queue is used for 
transmission of data within a processor, rather than between processors, and its removal 
does not significantly change the protocol. 

Both PIR and DIR were modeled at a fairly detailed level, resulting in substantial 
descriptions in the Murtp language (165 and 1397 lines of code, respectively, excluding 
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comments). The Murtp language is more expressive than the formalism of Sec. 4, but 
the protocol family implementations used only constructs that conform to this formal- 
ism. Although the construction in Sec. 4 is obviously automatable, we do not yet have 
an implementation for the Murtp language, so we performed the construction by hand, 
exactly following the described syntactic transformations. 

In addition, for the DIR protocol, the verification runs against the “automatically”- 
generated Q spaced out, so the protocol was manually abstracted further, yielding Q'\ 
the numbers in Table 1 refer to the completed verification runs against Q' . The essential 
difference between Q and Q' is that in the latter, all fields (except for the address field) 
in messages pertaining to the abstracted address ^ were abstracted. The same approach 
was taken with a local record at each processor called the transaction buffer. Since Q' is 
an over-approximation of Q, the fact that the verification succeeded for Q' proves that 
verification would have succeeded for Q, had we had sufficient memory resources. In 
other words, our automatic construction of Q was accurate enough to prove SSC of the 
protocol family. In general, one could envision a tool that automatically attempts such 
additional abstractions if model checking the original Q is intractable. 

Table 1 gives our experimental results. The 2-processor runs were successful, show- 
ing that our generated abstraction is accurate enough, even for these protocols with non- 
trivial reordering properties (which are therefore hard-to-prove sequentially consistent). 

Table 1. Results for the PIR (Piranha) and DIR (Wisconsin Directory) Protocols. All runs are with 
the number of processors n = 2, except for PIR3 with n = 3. We experimented with two versions 
of PIR: with processors’ queue depth q = I and q = 2. We use a method of Qadeer [29] to prove 
SSC of our generated finite-state abstract protocol. This method requires separate model-checking 
mns for each fc = 1 , . . . , u, to check for cycles of length 2k in a graph of ordering constraints. Times 
are in seconds, on a 2Ghz Intel Xeon with 4GB memory mnning Linux, “prob” is an upper-bound 
on the probability of missing states due to hash compaction, as reported by Murcp [31] (40 bit 
hashes for PIR; 42 bits for DIR). The n = 2 mns concluded successfully, enabling us to conclude 
SSC for each example over all address counts m and data value counts v where m> n = 2. The 
PIR3 results are a preliminary attempt with n = 3 processors. For PIR3 with k = 2, we used 35-bit 
hashes. We have not yet completed ak = 3 mn at press time. 
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7 Related Work 

There is a rich and successful literature pertaining to the verification of assorted safety 
and liveness properties on non-parameterized protocols, which we do not have space to 
summarize. We focus here on work pertaining to verification of SC for parameterized 
families of protocols. The problem of determining whether a finite state protocol is SC 
is undecidable [2], and so clearly also is the problem of determining whether a family 
of parameterized protocols is SC. Therefore, all methods are necessarily incomplete. 
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Some works verify weaker properties than sequential consistency, over parameter- 
ized memory protocols. For example, McMillan [23] uses compositional model check- 
ing to generate invariants that can be used to verify safety and liveness properties of the 
FLASFl cache coherence protocol. Pong and Dubois [28], Delzanno [12], and Emerson 
and Kahlon [13] have produced automatic methods to verify safety properties over pa- 
rameterized memory protocols, but these methods model the protocols at a very high 
level of abstraction, where implementation details are hidden. Pnueli et al. [27] and 
Arons et al. [4] present a general, sound-but-incomplete method for verifying param- 
eterized systems. This work is similar in spirit to ours (and inspired our specification 
formalism) in that an incomplete procedure is used to guess a candidate, whose verifica- 
tion proves the property for the entire family. Their method, however, attempts to derive 
an inductive invariant for proving simple assertions, whereas ours derives an abstract 
protocol that preserves violations of SSC. All of these works can handle protocol fami- 
lies with parameterized number of processors, which we cannot. On the other hand, our 
method proves SSC rather than weaker safety properties, and applies to any protocol 
that can be expressed in the syntax of Sec. 4. 

Several proofs of SC for parameterized protocols are manual in nature. Some [21, 
26, 3] use theorem-provers, while others use formal frameworks to provide a rigorous 
proof [24]. In either case, a human must develop insights needed to build up the proof, 
and the process is quite time-consuming. 

Qadeer’s method [29] for verifying sequential consistency of finite-state cache co- 
herence protocols actually provides a parameterized proof in one dimension, namely 
data values. Our work can be viewed as an extension of Qadeer’s to two dimensions 
(data values and addresses). The only other approach known to us that uses model 
checking to verify sequential consistency of parameterized families of protocols is that 
of Henzinger et al. [15]. Their semi-automatic method, which builds on a method for 
structural induction on processes, can handle families of protocols that are parameter- 
ized in the number of processors, in addition to the number of addresses and data values. 
However, a limitation of their method is that its soundness relies on the assumption that 
the protocol to be verified is location monotonic, in the sense that any trace of the sys- 
tem projected onto a subset of locations is a trace of the system with just that subset 
of locations. Henzinger et al. do not provide a method (automated or otherwise) for 
testing location monotonicity of a parameterized family of protocols. (Nalumasu [25] 
does provide a framework for expressing protocols that guarantees location monotonic- 
ity, but the framework has very limited expressiveness.) Moreover, the Henzinger et al. 
method works only for protocols whose set of traces can be reordered by a finite state 
machine to form serial traces, a restriction that typically holds in practice only for pro- 
tocols that are already fully serial. While protocols with traces that are not in simple 
SC can be handled, many protocols do not have finite-state observers; examples include 
both protocols described in this paper and the lazy caching protocol of Afek et al. [1]. 
(Although Henzinger et al. state in their paper that the lazy caching protocol has a finite 
state observer, this is only true of their simplified version.) In summary, while the Hen- 
zinger et al. method does allow the number of processors to be parameterized (unlike 
ours), their method is not automatic and applies to a limited protocol class that excludes 
important real protocols. 
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Abstract. We describe a viable approach for memory abstraction that pre- 
serves memory semantics, thereby augmenting the capability of SAT-based 
BMC to handle designs with large embedded memory without explicitly 
modeling each memory bit. Our method does not require examining the design 
or changing the SAT-solver and is guaranteed not to generate false negatives. 
The proposed method is similar, but with key enhancements, to the previous 
abstract interpretation of memory that captures its forwarding semantics, i.e., a 
data read from a memory location is same as the most recent data written at the 
same location. In our method, we construct an abstract model for BMC by 
eliminating memory arrays, but retaining the memory interface signals and 
adding constraints on those signals at every analysis depth to preserve the 
memory semantics. The size of these memory-modeling constraints depends 
quadratically on the number of memory accesses and linearly on the bus widths 
of memory interface signals. Since the analysis depth of BMC bounds the num- 
ber of memory accesses, these constraints are significantly smaller than the ex- 
plicit memory model. The novelty of our memory-modeling constraints is that 
they capture the exclusivity of a read and write pair explicitly, i.e., when a 
SAT-solver decides on a valid read and write pair, other pairs are implied 
invalid immediately, thereby reducing the solve time. We have implemented 
these techniques in our SAT-based BMC framework where we demonstrate the 
effectiveness of such an abstraction on a number of hardware and software de- 
signs with large embedded memories. We show at least an order of magnitude 
improvement (both in time and space) using our method over explicit modeling 
of each memory bit. We also show that our method of adding constraints boosts 
the performance of the SAT solver (in BMC) significantly as opposed to the 
conventional way of modeling these constraints as nested if-then-else expres- 
sions. 



1 Introduction 

Formal verification techniques like SAT-based Bounded Model Checking (BMC) 
[1-4] enjoy several nice properties over BDD-based symbolic model checking [5, 6]; 
their performance is less sensitive to the problem sizes and they do not suffer from 
space explosion. Due to the many recent advances in DPLL-style SAT solvers [7-11], 
SAT-based BMC can handle much larger designs and analyze them faster than be- 
fore. 

R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 440-452, 2004. 
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Designs with large embedded memories are quite common and have wide applica- 
tion. However, these embedded memories add further complexity to formal verifica- 
tion tasks due to an exponential increase in state space with each additional memory 
bit. In typical BMC approaches [1-4], the search space increases with each time- 
frame unrolling of a design. With explicit modeling of large embedded memories, the 
search space frequently becomes prohibitively large to analyze even beyond a reason- 
able depth. In order to make BMC more useful, it is important to have some abstrac- 
tion of these memories. However, for finding real bugs, it is sufficient that the ab- 
straction techniques capture the memory semantics [12] without explicitly modeling 
each memory bit. In the following, we discuss some of the existing abstraction tech- 
niques. 

In bottom-up abstraction and refinement techniques [13-18], starting from a small 
abstract model of the concrete design, counter-examples discovered are used to refine 
the model iteratively. In practice, several iterations are needed before a property can 
be proved correct or a real counter-example can be found. Note that after every itera- 
tive refinement step the model size increases, making it increasingly difficult to ver- 
ify. In contrast to the bottom-up approaches, top-down approaches [19, 20] use reso- 
lution-based proof techniques in SAT to generate the abstract model, starting from a 
concrete design. As shown in [20], one can use iterative abstraction to progressively 
reduce the model size. Since these approaches use the concrete model to start with, it 
may not be feasible to apply them on designs with large memories. Moreover, both 
approaches, in general, are not geared towards extracting the memory semantics. 

To capture the memory semantics, Burch and Dill introduced the interpreted read 
and write operations in their logic of equality with un-interpreted functions (EUF) 
[12] instead of naive un-interpreted abstraction of memories. These interpreted func- 
tions were used to represent the memory symbolically by creating nested if-then-else 
(ITE) expressions to record the history of writes to the memory. Such partial interpre- 
tation of memory has also been exploited in later derivative verification efforts [21- 
24]. Specifically in [21], Velev et al. used this partial interpretation in a symbolic 
simulation engine to replace memory by a behavioral model that interacts with the 
rest of the circuit through a software interface that monitors the memory control sig- 
nals. Bryant et al. proposed the logic of Counter arithmetic with Lambda expressions 
and Un-interpreted functions (CLU) to model infinite- state systems and unbounded 
memory in the UCLID system [25]. Memory is modeled as a functional expression 
whose body changes with each time step. Similar to [12], the memory is represented 
symbolically by creating nested ITE expressions. One can use BMC to verify safety 
properties with this restricted CLU logic. 

In this paper, we describe a viable approach for memory abstraction that preserves 
memory semantics, thereby augmenting the capability of SAT-based BMC to handle 
designs with large embedded memory without explicitly modeling each memory bit. 
Our method does not require examining the design or changing the SAT-solver and is 
guaranteed not to generate false negatives. The proposed method is similar, but with 
key enhancements, to the abstract interpretation of memory [12, 21] that captures its 
forwarding semantics, i.e., a data read from a memory location is the same as the 
most recent data written at the same location. In our method, we construct an abstract 
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model for BMC by eliminating memory arrays, but retaining the memory interface 
signals and adding constraints on those signals at every analysis depth to preserve the 
semantics of the memory. The size of these memory-modeling constraints depends 
quadratically on the number of memory accesses and linearly on the bus widths of 
memory interface signals. Since the analysis depth of BMC bounds the number of 
memory accesses, these constraints are significantly smaller than the explicit memory 
model. The novelty of our memory-modeling constraints is that they capture the 
exclusivity of a read and write pair explicitly, i.e., when a SAT-solver decides on a 
valid read and write pair, other pairs are implied invalid immediately, thereby 
reducing the solve time. We have implemented these techniques in our SAT-based 
BMC framework where we demonstrate the effectiveness of such an abstraction on a 
number of hardware and software designs with large embedded memories. We show 
at least an order of magnitude improvement (both in time and space) using our 
method over explicit modeling of each memory bit. We also show that our method of 
adding constraints boosts the performance of the SAT solver (in BMC) significantly 
as opposed to the conventional way of modeling these constraints as nested if-then- 
else expressions [12]. 

Outline. In Section 2 we give basic idea used in our approach, in Section 3 we give 
relevant background on BMC and memory semantics, in Section 4 we discuss our 
approach in detail, in Section 5 we discuss our experiments, and in Section 6 we con- 
clude with summarizing remarks and future work. 

2 Basic Idea 

For typical designs with embedded memory, a Main module interacts with the mem- 
ory module MEM using the interface signals as shown in Figure 1 . For a single-port 
memory at any given clock cycle, the following observations can be made: a) at most 
one address is valid, b) at most one write occurs, and c) at most one read occurs. 



Addr: Address Bus 
WD : Write Data 
RD : Read Data 
WE : Write Enable 
RE : Read Enable 

Fig. 1. Design with embedded memory 




As BMC-search-bound k becomes larger, the unrolled design size increases linearly 
with the size of memory bits. For designs with large embedded memories, this in- 
creases the search space prohibitively for any search engine to analyze. 

It can be observed in this memory model that memory read at any depth depends 
only on the most recent data written previously at the same address. Therefore, to 
enable the SAT-based BMC to analyze deeper on such designs, we replace an explicit 
memory model with an efficient memory model as follows: 
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a) We remove the MEM module but retain the memory interface signals and the 
input-output directionality with respect to the Main module. 

b) We add constraints at every analysis depth k on the memory interface signals that 
preserve the forwarding semantics of the memory. 

To improve the SAT solver (in BMC), we also do the following: 

c) We add constraints such that when the SAT-solver decides on a valid read and 
write pair, other pairs are implied invalid immediately. 

Note that although a) and b) are sufficient to generate an efficient model that pre- 
serves the validity of a correctness property, c) makes the performance of the SAT- 
based BMC superior as observed in our experiments. Moreover, we do not have to 
examine the Main module while adding these memory-modeling constraints. Though 
we consider a single-port memory for the ensuing discussion, our method is also 
extendable to multi-port memories. In the next two sections, we will give some rele- 
vant background and then formalize our contribution. 



3 Background 



Bounded Model Checking 

In BMC, the specification is expressed in LTL (Linear Temporal Logic). Given a 
Kripke structure M, an LTL formula /, and a bound k, the translation task in BMC is 
to construct a propositional formula [M, fj^,, such that the formula is satisfiable if and 
only if there exists a witness of length k [26, 27]. The satisfiability check is performed 
by a backend SAT solver. Verification typically proceeds by looking for witnesses or 
counter-examples (CE) of increasing length until the completeness threshold is 
reached [26, 27]. The overall algorithm of a SAT-based BMC method for checking 
(or falsifying) a simple safety property is shown in the Figure 2. Note that P‘ denotes 
the property node at i“' unrolling. The SAT problems generated by the BMC transla- 
tion procedure grow bigger as k increases. Therefore, the practical efficiency of the 
backend SAT solver becomes critical in enabling deeper searches to be performed. 

BMC (k, P) { //Falsify safety property P within bound k 
for (int i=0; i<=k ; i++) { 

P^= Unroll{P,i) ///Property node at i“ unrolling 
if (SAT_SoIve (P'=0) =SAT) return CE;//Try to falsify 

return NO_CE; } //No counter-example found 



Fig. 2. SAT-based BMC for Safety Property P 



Memory Semantics 

Embedded memories are used in several forms such as RAM, stack, and FIFO with at 
least one port for data access. For our discussion, we will assume a single-port mem- 
ory, as shown in Figure 1, with the following interface signals: Address Bus (Addr), 
Write Data Bus (WD), Read Data Bus (RD), Write Enable (WE), and Read Enable 
(RE). The write phase of the memory is a two-cycles event, i.e., in the current clock 
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cycle when the data value is assigned to WD bus, the write address location is as- 
signed to the Addr bus and the WE signal is made active, the new data is then avail- 
able in the next clock cycle. The read phase of memory is a one-cycle event, i.e., 
when the read address location is assigned to the Addr bus and RE is made active, the 
read data is assigned to the RD bus in the same clock cycle. 

Assume that we unroll the design up to depth k (starts from 0). Let S' denote a 
memory interface signal variable S at time frame j. Let the Boolean variable E‘' de- 
note the address comparison between time frames i and j and be defined as E‘'=(Addr‘ 
=Addr'). Then the forwarding semantics of the memory can be expressed as: 

RD" = {WD I D' =1aWE'=1aRE =1aVj^,^(E‘- = 0vWE‘=0)}, where j<k (1) 

In other words, data read at depth k equals the data written at depth j if addresses are 
equal at k and j, write enable is active at j, read enable is active at k, and for all depths 
between j and k, either addresses are different from that of k or no write happened. 

4 Our Approach 

In our approach, we augment SAT-based BMC (a part of our formal verification 
platform) with a mechanism to add memory-modeling constraints at every unroll 
depth of BMC analysis. We use a hybrid SAT solver [10] that uses hybrid representa- 
tions of Boolean constraints, i.e., 2-input OR/INVERTER gates to represent the 
original circuit problem and CNE to represent the learned constraints. In this work, 
we extend the use of hybrid representation to model memory constraints efficiently. 
We specifically compare it with only a 2-input uniform gate (instead of a multi-input 
gate) representation, as it was already shown in [11] that efficient Boolean constraint 
propagation can be achieved using a fast table lookup scheme on such a representa- 
tion. 

In order to add the constraints for the forwarding semantics of memory as in (1), 
one can use a conventional approach based on the selection operator ITE* in the fol- 
lowing way. Let the Boolean variable s'* denote the valid read signal and be defined 
as s'* = E'*aWE'. Then the data read at depth k (given RE*=1) is expressed as: 

RD* = ITE(s* '*, WD* ‘, ITE(s*'"*, WD* . . . .ITE(s"*, WD", WD ')) . . . ) (2) 

where WD* denotes the initial data value, same in all memory locations. We can 
extend (2) to handle non-uniform memory initialization as well. 

Note that when constraints are added as above, the decision s' =1 does not neces- 
sarily imply RD*=WD‘; other pairs need to be established invalid through decision 
procedures as well, i.e., s“'*=0, s‘'^^'*=0,...,s*'*=0. Instead we add explicit constraints 
to capture the read and write pairs exclusively. Once a read and write pair is chosen 
by the SAT-solver, the other pairs are implied invalid immediately. Let the Boolean 



* ITE when applied on three Boolean variables is defined as ITE(s,t,e) = (s & t) | (!s & e). 
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variable S'* denote the exclusive valid read signal and the Boolean variable PS‘* de- 
note the intermediate exclusive signal. They are defined as follows: 

PS‘* = !s‘*aPS"'* (=RE" fori=k) 

S‘* =s‘*aPS‘"‘* (=PS"*fori=-l) (3) 

Now equation (1) can be expressed as 

RD" = (S*'*aWD*')v (S“*aWD“ ) V ... V (S^WD" )v(S''*aWD‘‘ ) (4) 

Note that S‘*=l, immediately implies S*‘‘=0 where j;^i and i,j < k. 

4.1 Efficient Representation of Memory Modeling Constraints 

As mentioned earlier, we use a hybrid representation for building constraints. We 
capture equations (3) and (4) efficiently 

1) by not representing the constraints as a large tree-based structure since such a 
structure adversely affects the BCP performance as observed in the context of add- 
ing large conflict clauses [10], and 

2) by not creating unnecessary 2-literal clauses since they too adversely affect a 
CNF-based SAT-solver that uses 2-literal watch scheme for BCP [28]. 

We implemented the addition of memory modeling constraints as part of the proce- 
dure EMM_Constraints, which is invoked after every unrolling as shown in the modi- 
fied BMC algorithm (m-BMC) in Figure 3. The procedure EMM_Constraints, as 
shown in Figure 4, generates the constraints at every depth k using 3 sub-procedures: 
Gen_Addr_Equal_Sig, Gen_Valid_Read_Sig, and Gen_Read_Data_Constraints. It 
then returns the accumulated constraints C' up to depth i. As we see in the following 
detailed discussion, these constraints C'‘ capture the forwarding semantics of the mem- 
ory very efficiently up to depth k. 



//BMC with efficient memory modeling 
m-BMC (k,P) { 

C '=([); //initialize memory modeling constraints 
for (int i=0; i<=k ; i++) { 

P'= Unroll (P, i) ; //Get property node at i"" unrolling 
C'=SMM_Constraints ( i , C' ') ; //update the constraints 
if (SAT_SoIve ( p'=0aC'= 1) =SAT) return CE;}//Try to falsify 
return NO_CE; } //No counter-example found 



Fig. 3. Improved SAT-based BMC with Efficient Memory Modeling 

Gen_Addr_Equal_Sig: Generation of Address Comparison Signals 

Fet m denote the bit width of the memory address bus. We implement the address 
comparison as follows: for every address pair comparison (Addr’=Addr’‘) we intro- 
duce new variables F** and e*'‘j (for every bit i). Then we add the following CNF 
clauses for each i: 
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(!£'•" + Addr*, + !Addr") , (IE*' + !Addi^. + AddE), 

+ AddPj + Addr"),' + !AddPj +‘!Addr") ‘ 

Finally, one clause to connect the relation between and e^*, i.e., 

(!d’>...+!e'>... + 

Note that these clauses capture the relation that E'' =1 if and only if (Addr'=Addr*‘). 
The naive way to express the same equivalence relation structurally would be to use 
an AND-tree of X-NOR (®) gates as follows: 

E" = (Addr*„®Addr\) a ...a (Addf„,.j®Addr"„.,) 

Clearly, this representation would require 4m-l 2-input OR gates, amounting to 
12m-3 equivalent CNE clauses (3 clauses per gate). Our representation on the other 
hand, requires only 4m+l clauses and does not require any 2-literal clause. Thus, at 
every depth k, we add only (4m+l)k clauses for address comparison rather than the 
(12m-3)k gates clauses required by the naive approach. 



//Modeling of memory constraint at depth k 
//where C is cumulative constraints up to depth k-1 
EMM_Constraints (k, C) { 

/ /Generate address equal signals 
Gen_Addr__Equal_Sig Ck) ; 

/ /Generate exclusive valid read signals 
Gen_Valid_Read_Sig (k ) ; 

/ /Generate constraints on read data 
C(k) = Gen_Read_Data_Constraints (k) ; 
return C U C (k) ; } 



Fig. 4. Efficient Modeling of Memory Constraint 



Gen_Valid_Read_Sig: Generation of exclusive valid read signals 



To represent the exclusive valid read signals as in equation (3), we use a 2-input gate 
representation rather than CNF clauses. Since each intermediate variable has fan-outs 
to other signals, we cannot eliminate them. If we were representing those using CNF 
clauses, it would introduce too many additional 2-literal clauses. This representation 
adds 3k 2-input gates (or 9k gate clauses) at every depth k. 



Gen_Read_Data_Constraints: Generation of constraints on read data signals 

By virtue of equation (3), we know that for a given k, at most one S^*=l, V We 
use this fact to represent the constraint in equation (4) as CNF clauses. Let n denote 
the bit width of the data bus. We add the following clauses V 
(!S'" H- !RD“, H- WDi), (!S'* H- RD" 4- !WEf,) 

To capture validity of read signal, we add the following clause 
(!RE"h-S'‘-‘'h-...-hS'*-i-...h-S"‘'*) 

Thus we add 2n(k+l)+l clauses at every depth k. On the other hand, if we use gate 
representation, it would require n(2k+l) gates and therefore, 3n(2k+l ) gate clauses. 
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Overall, at every depth k, our hybrid exclusive select signals representation adds 
(4m+2n+l )k+2n+l clauses and 3k gates as compared to (4m+2n+2)k +n gates in a 
purely circuit representation. Note that though the size of these accumulated con- 
straints grows quadratically with depth k, they are still significantly smaller than the 
explicit memory-model. 



5 Experiments 



For our experiments, we used three well-known recursive software programs Fibo- 
nacci, 3n+l, and Towers-of Hanoi with an embedded stack as shown in Figure 5 and 
one hardware design with embedded RAM. In each of these cases, we chose a safety 
property that makes the modeling of the entire memory imperative, i.e., we simply 
cannot remove away the memory from the design. We translated each of the software 
programs into equivalent hardware models using Verilog HDL using a stack model. 
For each of the software designs, we use an inverse function to describe the negated 
safety property that requires a non-trivial state space search, e.g., given a certain 
value of fibonacci number does there exist a corresponding n? (Similar queries are 
made for a given number of recursive calls to terminate 3n+l, and for a given num- 
ber of legal moves required in Towers-of-Hanoi.) 



1. //Fibonacci 

2. //cache and recursion 

3. fib(n) { 

4. if (n<2) return n,- 

5. //cache lookup 

6. if (lookup (n, Scf) ) 

7 . return f ; 

8. f = f ib (n-1) +f ib (n-2) ; 

9. //insert cache 
10 . store (n, f ) ; 

11. return f; } 

12 . //3n+l 

13. //period tracks # of 

14. //calls req. to converge; 

15. //initialized to 0 



16. 3nPlusl(n) { 

17. if (n==l) return; 

18. if ((odd(n)) 

19. 3nPlusl (3*n+l) ; 

20. else 

21. 3nPlusl (n/2) ; 

22. period++; } 

23. //Towers of Hanoi 

24. //count tracks # of moves 

25. //req.; initialized to 0 
26 . toh (n, s , d, a) { 

27. if (n==0) return; 

28. toh(n-i,s,a,d) ; 

29. count++; 

30. toh (n-1 , a, d, s) ; } 



Fig. 5. Software programs with embedded stack used in experiments 



We conducted our experiments on a workstation composed of dual Intel 2.8 GHz 
Xeon Processors with 4GB physical memory running Red Hat Linux 7.2 using a 3 
hours time limit for each BMC run. We compare the performance of augmented 
SAT-based BMC (m-BMC) for handling embedded memory with basic SAT-based 
BMC using explicit memory modeling. We also compare the performance of m-BMC 
using our hybrid exclusive select signal representation (hESS), with that of a hybrid 
nested ITE representation (hlTE). In addition, we show the effect on their perform- 
ance with increasing memory sizes for a given property and design. 
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We performed our first set of experiments on the hardware models of the software 
programs with several properties selected as described above. Each of the properties 
has a non-trivial witness and is listed in Tables 1-4 in the order of increasing search 
complexity. We used a fixed memory size in each of the models. We also used one 
industrial hardware design with a safety property that is not known to have a counter 
example. For these properties, we show the performance and memory utilization 
comparison of the memory modeling styles, Le., explicit memory modeling, memory 
modeling using our hESS representation and that using hITE representation in the 
Tables 1-4. In Tables 1-3, we show comparison results for Fibonacci, 3n+l, and 
Towers-of-Hanoi, respectively. We used address width (AW)=12, data width 
(DW)=32 for Fibonacci, AW=12, DW=2 for 3n+l, and AW=12, DW=22 for Tow- 
ers-of-Hanoi models. In Table 4, we show comparison results for the industrial hard- 
ware design with a given safety property S for various intermediate analysis depths as 
the property was not violated within the resource limit. Without the memory, the 
design has -400 latches and ~5k gates. The memory module has AW=12 and 
DW=12. 

In Tables 1-4, the E' Column shows the properties with increasing complexity, the 
2”“’ Column shows the witness depth (intermediate analysis depth in Table 4), 3-7 
Columns show the performance figures and 8-12 Columns show the memory utiliza- 
tion figures. Specifically in the performance columns. Columns 3-5 show the BMC 
search time taken (in seconds) for explicit memory modeling (PI), using hITE (P2), 
and using hESS (P3) respectively; Columns 6 and 7 show the speed up (ratio) using 
hESS over the explicit memory modeling and hITE respectively. For the memory 



Table 1. Comparison of memory modeling on Fibonacci model (AW=12, DW=32) 



Prp 


Wit 

Depth 


Pcrfomiancc 


Memory Utilization 


Explicit 

Pl(s) 


hlTE 

P2(s) 


hESS 

P3(s) 


Speed 

P1/P3 


Speed 

P1/P3 


Explicit 
Ml (mb) 


hlTE 

M2(mb) 


hESS 

M3(mb) 


Red. 

M3/M1 


Red. 

M3/M2 


l-l 


14 


179 


1 


1 


146 


1.1 


517 


7 


6 


0.01 


0.86 


1-2 


25 


1050 


5 


4 


248 


1.3 


1411 


12 


10 


0.01 


0.83 


1-3 


38 


2835 


20 


15 


184 


1.3 


2239 


22 


17 


0.01 


0.77 


1-4 


51 


NA 


79 


47 


NA 


1.7 


MO 


41 


34 


NA 


0.83 


1-5 


64 


NA 


125 


100 


NA 


1.3 


MO 


63 


52 


NA 


0.83 


1-6 


77 


NA 


252 


311 


NA 


0.8 


MO 


100 


75 


NA 


0.75 


1-7 


90 


NA 


587 


362 


NA 


1.6 


MO 


175 


92 


NA 


0.53 


1-8 


103 


NA 


625 


557 


NA 


1.1 


MO 


163 


161 


NA 


0.99 


1-9 


116 


NA 


1060 


674 


NA 


1.6 


MO 


189 


187 


NA 


0.99 


1-10 


129 


NA 


1674 


1359 


NA 


1.2 


MO 


343 


204 


NA 


0.59 


1-11 


142 


NA 


3782 


2165 


NA 


1.7 


MO 


353 


372 


NA 


1.05 


1-12 


155 


NA 


2980 


2043 


NA 


1.5 


MO 


421 


303 


NA 


0.72 


1-13 


168 


NA 


4349 


4517 


NA 


1.0 


MO 


319 


623 


NA 


1.95 


1-14 


181 


NA 


5573 


4010 


NA 


1.4 


MO 


485 


335 


NA 


0.69 


1-15 


194 


NA 


6973 


4889 


NA 


1.4 


MO 


558 


531 


NA 


0.95 


1-16 


207 


NA 


>3hr 


7330 


NA 


NA 


.MO 


541 


461 


NA 


0.85 
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Table 2. Comparison of memory modeling on 3n+l model (AW=12, DW=2) 



Prp 


Wit 

Depth 


Performance 


Memory Utilization 


Explicit 

Pl(s) 


hITE 

P2(s) 


hESS 

P3(s) 


Speed 
PI P3 


Speed 

P2/P3 


Explicit 
M l(mb) 


hlTE 

M2(mb) 


hESS 

M3(mb) 


Red. 

M3/M1 


Red. 

M3/MI 


2-1 


44 


2736 


85 


51 


54 


1.7 


293 


25 


20 


0.07 


0.8 


2-2 


47 


3837 


109 


138 


28 


0.8 


314 


30 


37 


0.12 


1.23 


2-3 


50 


281 1 


167 


160 


18 


1.0 


412 


44 


37 


0.09 


0.84 


2-4 


53 


3236 


205 


207 


16 


1.0 


407 


42 


58 


0.14 


1.38 


2-5 


56 


5643 


258 


264 


21 


1.0 


569 


48 


44 


0.08 


0.92 


2-6 


59 


4518 


312 


277 


16 


1.1 


432 


56 


49 


0.11 


0.88 


2-7 


62 


9078 


324 


368 


25 


0.9 


479 


58 


59 


0.12 


1.02 


2-8 


65 


9613 


426 


483 


20 


0.9 


585 


72 


85 


0.15 


1.18 


2-9 


68 


10446 


487 


522 


20 


0.9 


648 


73 


64 


0.10 


0.88 


2-10 


71 


9903 


562 


590 


17 


1.0 


668 


82 


74 


0.11 


0.90 


2-11 


74 


> 3 hr 


674 


692 


NA 


1.0 


981 


83 


92 


NA 


1.11 


2-12 


77 


> 3 hr 


910 


746 


NA 


1.2 


719 


no 


83 


NA 


0.75 


2-13 


80 


> 3hr 


820 


861 


NA 


1.0 


875 


106 


89 


NA 


0.84 


2-14 


83 


>3hr 


969 


990 


NA 


1.0 


586 


113 


80 


NA 


0.71 


2-15 


89 


>3hr 


1292 


1201 


NA 


1.1 


659 


127 


1 13 


NA 


0.89 



Table 3. Comparison of memory modeling on Towers-of-Hanoi (AW=12, DW=22) 



Prp 


Wit 

Depth 


Performance 


Memory Utilization 


E.xplicit 
PI (sec) 


hITE 

P2(s) 


hESS 

P3(s) 


Speed 
PI P3 


Speed 

P2/P3 


E.xplicit 
M l(mb) 


hITE 

M2(mb) 


hESS 

M3(mb) 


Red 

M3/M1 


Red. 

M3/M2 


3-1 


10 


4 


0 


0 


149 


1.3 


71 


3 


3 


0.04 


1.00 


3-2 


24 


182 


1 


1 


264 


1.2 


664 


6 


5 


0.01 


0.83 


3-3 


52 


2587 


13 


10 


255 


1.2 


2059 


16 


12 


0.01 


0.75 


3-4 


108 


NA 


229 


129 


NA 


1.8 


MO 


68 


43 


NA 


0.63 


3-5 


220 


NA 


1266 


838 


NA 


1.5 


MO 


214 


143 


NA 


0.67 


3-6 


444 


NA 


8232 


6925 


NA 


1.2 


MO 


845 


569 


NA 


0.67 



utilization columns, Columns 8-10 show the memory used (in MB) hy explicit mem- 
ory modeling (Ml), using hITE (M2), and using hESS (M3) respectively; Column 1 1 
and 12 show the memory usage reduction (ratio) using hESS over the explicit mem- 
ory modeling and hITE respectively (Note, MO = Memory Out, NA = Not Ap- 
plicable). 

Observing the performance figures in Column 6 of the Tables 1-4, we see that our 
approach improves the performance of BMC by 7-2 orders of magnitude when com- 
pared to explicit memory modeling. Similarly, as shown in Column 11 of these ta- 
bles, there is a reduction in memory utilization by 7-2 orders of magnitude by the use 
of our approach. Moreover, our modeling style of using the hybrid exclusive select 
signals representation is better than the hybrid nested ITE, as shown in Column 7 and 
12. Noticeably, in the last row of Table 1 and 4, hITE times out while our approach 
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Table 4. Comparison of memory modeling on industrial hardware design (AW=12, DW=12) 



Prp 


Inter 

Depth 


Performance 


Memory Utilization 


Explicit 

Pl(s) 


hITE 

P2(s) 


hESS 

P3(s) 


Speed 

P1/P3 


Speed 

P2/P3 


Explicit 
M l(mb) 


hITE 

M2(mb) 


hESS 

M3(mb) 


Rod. 
M3/M 1 


Red. 

M3/M2 


S 


68 


10680 


1264 


925 


1 1 


1.3 


2049 


91 


64 


0.03 


0.7 


150 


NA 


9218 


7140 


NA 


1.3 


MO 


770 


261 


NA 


0.3 


178 


NA 


>3hr 


10272 


NA 


NA 


MO 


NA 


908 


NA 


NA 



Table 5. Comparison of memory modeling (for 3n+l) with DW=12 and varying address width 



Pr 

P 




Perfonnance 


Memorx' Utilization 


AW 


Explicit 

PKs) 


hITE 

P2(s) 


hESS 

P3(s) 


Speed 

PI.P3 


Speed 

P2/P3 


Explicit 
M l(mb) 


hITE 

M2(mb) 


hESS 

M3(mb) 


Red 

M3/MI 


Red 

M3/M2 




4 


64 


85 


60 


1.1 


1.4 


23 


22 


20 


0.87 


0.9 




5 


81 


72 


47 


1.7 


1.5 


21 


22 


17 


0.81 


0.8 




6 


1 10 


77 


79 


1.4 


1.0 


25 


23 


24 


0.96 


1.0 




7 


117 


99 


53 


2.2 


1.9 


28 


25 


19 


0.68 


0.8 




8 


146 


87 


78 


1.9 


1.1 


41 


24 


24 


0.59 


1.0 


2-1 


9 


265 


79 


73 


3.6 


I.I 


49 


25 


22 


0.45 


0.9 




10 


767 


86 


59 


12.9 


1.4 


95 


27 


22 


0.23 


0.8 




1 1 


1490 


89 


56 


26.4 


1.6 


153 


27 


20 


0.13 


0.7 




12 


2736 


85 


51 


54.1 


1.7 


293 


25 


20 


0.07 


0.8 




13 


3759 


83 


54 


69.6 


1.5 


569 


24 


21 


0.04 


0.9 




14 


11583 


81 


46 


249.2 


1.7 


1452 


25 


18 


0.01 


0.7 



using hESS completes the analysis within the 3 hours time limit. Note that due to tail 
recursive nature of the 3n+l program, the search complexity is not severe and there- 
fore, we don’t see the consistent benefit of exclusive select signals for this example in 
Table 2. On average, we see a performance improvement of 30% and a reduction in 
memory utilization of 20%, noticeably more at higher analysis depths. 

In the second set of experiments, we used different memory sizes for the model 
3n+l and the property 2-7. We varied the address bus width AW from 4 to 14 bits 
and compare the performance and memory utilization of the three approaches as 
shown in Column 2 of Table 5. The description of the remaining Columns in Table 5 
is same as that in Tables 1-4. As shown in Columns 6 and 10, the performance im- 
provement and memory usage reduction gets more pronounced, about 2 orders of 
magnitude, with increasing memory size. Clearly, the merits from adding memory- 
modeling constraints outweigh its quadratic growth. Moreover, our approach show on 
average 50% performance improvement and 20% memory usage reduction over 
nested ITE expressions. 

6 Conclusions 

Verifying designs with large embedded memories is typically handled by abstracting 
out (over-approximating) the memories. Such abstraction is generally not useful for 
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finding real bugs. Current SAT-based BMC efforts are incapable of handling designs 
with explicit memory modeling due to enormously increased search space complex- 
ity. In this work, we are the first to use memory-modeling constraints to augment 
SAT-based BMC in order to handle embedded memory designs without explicitly 
modeling each memory bit. Our method does not require transforming the design and 
is also guaranteed not to generate false negatives. This method is similar to abstract 
interpretation of memory [12, 21], but with key enhancements. Our method can easily 
augment any SAT-based BMC effort. We demonstrate the effectiveness of our ap- 
proach on a number of software and hardware designs with large embedded memo- 
ries. We show about 1-2 orders of magnitude time and space improvement using our 
method over explicit modeling of each memory bit. We also show that our method of 
adding constraints boosts the performance of a SAT solver significantly as opposed to 
the conventional way of modeling these constraints as nested ITE expressions [12]. 
While the growth of memory modeling constraint clauses is quadratic with the analy- 
sis depth, we also observed that though the constraint clauses are sufficient they may 
not be necessary in every time frame. As an ongoing effort, we are further investigat- 
ing the possibility of adding these clauses only as and when required. 
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Abstract. The counterexamples produced by model checkers are often 
lengthy and difficult to understand. In practical verification, showing the 
existence of a (potential) bug is not enough: the error must be under- 
stood, determined to not be a result of faulty specification or assump- 
tions, and, finally, located and corrected. The explain tool uses distance 
metrics on program executions to provide automated assistance in under- 
standing and localizing errors in ANSTC programs, explain is integrated 
with CBMC, a bounded model checker for the C language, and features 
a GUI front-end that presents error explanations to the user. 



1 Introduction 

In an ideal world, given a detailed error trace, a programmer would always be 
able to quickly identify and correct the faulty portion of the code or specification. 
Practical experience, however, indicates that this is not the case. Understanding 
a counterexample often requires as much effort as preparing a program for model 
checking. As software model checking has become more concerned with practical 
applicability, the need for automated assistance in understanding counterexam- 
ples has been recognized [2,6]. The explain tool provides users with assistance 
in focusing on the relevant portions of source code and in understanding the 
causal dependencies involved in an error. 

CBMC [7] is a tool for verifying ANSI-C programs. CBMC is a bounded 
model checker (BMC) [3]: it produces from a C program a Boolean formula 
satisfiable by executions of the program that violate its specification (counterex- 
amples). The model checker supports pointer constructs, dynamic memory allo- 
cation, recursion, and the float and double data types. CBMC also features a 
graphical user interface designed to resemble an IDE (Integrated Development 
Environment) that allows users to interactively step through counterexample 
traces. 

explain uses the same bounded model checking engine to further analyze 
counterexample traces produced by CBMC. In particular, explain uses distance 
metrics on program executions [5], in a manner inspired by the counterfactual 
theory of causality [8], to provide a number of automatic analyses: 

— Given a counterexample execution, explain can automatically produce an 
execution that is as similar as possible to the failing run but does not violate 
the specification. 
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— explain can also automatically produce a new counterexample that is as 
different as possible from the original counterexample. 

— Finally, explain can determine causal dependencies between predicates in 
an execution. 

explain is used through the same GUI as CBMC. The interface allows users 
to step through explanatory traces as they would in a debugger (with the ability 
to step forwards and backwards). Portions of the code that explain suggests 
may be faulty are highlighted for the user. 

2 Using explain 

Using explain is an interactive process. The tool assists the user in understand- 
ing counterexamples, but knowledge of the program (and the specification) is 
necessary to guide the tool. As an example, we will use explain to narrow in 
on an error in a small but non-trivial C program. 

2.1 Debugging TCAS 

TCAS (Traffic Alert and Collision Avoidance System) is an aircraft conflict de- 
tection and resolution system used by all US commercial aircraft. The Georgia 
Tech version of the Siemens suite [9] includes 41 buggy versions of ANSI-C code 
for the Resolution Advisory (RA) component of the TCAS system. A specifica- 
tion for this code (in the form of assertions) is available from another study [4]. 

The first step in using explain to understand an error is to produce a coun- 
terexample. We load tcas.c into the GUI and run the CBMC model checker. 
After a few seconds, the GUI reports that the assertion on line 257 has been 
violated. 

The counterexample execution passes through 112 states. Single-stepping 
through the trace looking for a bug is not an appealing prospect, so we turn to 
explain for assistance in understanding what is wrong with our code. We run 
explain on the counterexample to find a successful execution that is as similar 
as possible to the failing run. explain uses the PBS [1] pseudo-Boolean solver to 
produce this trace, and lists the changes made to the original counterexample. 
The GUI highlights the lines that are involved in the changes^. 

Unfortunately, the explanation is less than useful. The failed assertion in the 
counterexample is an implication: 

P3_BCond = ( (Input_Up_Separation >= Layer_Positive_RA_Alt_Thresh)&& 
(Input_Down_Separation >= Layer_Positive_RA_Alt_Thresh)&& 
( Input _Own_Tracked_Alt < Input_Other_Tracked_Alt) ) ; 
assert (! (P3_BCond && PrB)); // P3_BCond -> ! PrB 

The successful execution most similar to the counterexample changes the value of 
Input_Down_Separation such that it is now < Layer_Positive_RA_Alt_Thresh, 
and no other values. We are really interested in finding out why, given that 

^ explain uses a causal slicing algorithm [5] to remove changes unrelated to the error. 
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Fig. 1. Correctly locating the error in tcas.c. 

P3_BCond holds, PrB also holds. In other words, we want to know the cause for 
the value of the consequent, not the antecedent, of the implication. Request- 
ing the more precise explanation is a simple matter of adding the constraint 
assume (Pl_BCond) and then rerunning explain. 

The new explanation is more interesting (Figure 1). Input_Down_Separation 
is, again, altered. This time the value has increased, maintaining the original 
value of P3_BCond. Stepping through the source code we notice the first high- 
lighted line in the run: 

result = ! (Own_Below_Threat 0 ) II ( (Own_Below_Threat () ) && 

( ! (Down_Separation > ALIM()))); 

Most of the values in the expression are unchanged from the erroneous run. 
Only Down_Separation has changed, causing result to be FALSE instead of 
TRUE. In the original run, Down_Separation was 500, and now it is 1792. A 
quick examination of the other 4 highlighted lines shows that they simply prop- 
agate the value of result computed. If the comparison of Down_Separation 
and ALIMO had resulted in TRUE in the original run, the assertion would have 
held. Looking at the code, we notice that the ALIM function returns Posi- 
tive_RA_Alt_Thresh [Alt_Layer_Value] . The variable watches window reveals 
that ALIMO has the value 500 in both runs. 

Suspiciously, the value of Down_Separation in the counterexample was also 
500, and the error did not appear when that value changed. It seems likely that 
the value of this expression is the source of the problem. In order to make the 
expression’s value in the faulty run match the value in the successful run, we 
need to change the > into a >= comparison. 
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We modify the source code to reflect our hypothesis about the source of the 
error and rerun CBMC. This time, the model checker reports that verification 
is successful: the program satisfies its specification. 

In other experiments, explain produced a 1 line (correct) localization of a 
127 state locking-protocol counterexample for a 2991 line fragment of a real-time 
OS microkernel. 

3 Conclusions and Future Work 

explain is a tool that uses a model checker to assist users in debugging programs 
(or specifications). The tool is fully integrated with a model checker that precisely 
handles a rich variety of the features of the ANSI-C language, and provides a 
graphical front-end for user interactions. Case studies have demonstrated that 
explain can successfully localize errors in a number of programs. 

In the future, we hope to improve both the graphical interface to explain 
and the underlying explanation algorithms, based on experience with more case 
studies and user feedback. 
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Abstract. Counterexample-driven abstraction refinement is an auto- 
matic process that produces abstract models of finite and infinite-state 
systems. When this process is applied to software, an automatic the- 
orem prover for quantifier-free first-order logic helps to determine the 
feasibility of program paths and to refine the abstraction. In this paper 
we report on a fast, lightweight, and automatic theorem prover called 
Zapato which we have built specifically to solve the queries produced 
during the abstraction refinement process. 



1 Introduction 

Slam is a symbolic software model checker for the C language which combines 
predicate abstraction together with counter-example guided abstraction refine- 
ment. Slam’s overall performance critically depends on the automatic theorem 
prover that is used during the refinement process. Two of Slam’s modules, 
Newton [1] and Constrain [2], generate many theorem proving queries dur- 
ing their analysis^. As an example, when Slam was used to check 30 properties 
of the Microsoft Windows Parallel Port device driver, Newton and Constrain 
called the prover 487,716 times. On average, these queries contained 19 unique 
atoms and 40 instances of Boolean operators per query. In the worst case, one 
query contained 658 unique atoms. Another query contained 96,691 instances of 
the Boolean operators. For the scalability of Slam it is paramount that these 
queries are solved in less than a half a second in the average case. 

In this paper we briefly describe Slam’s theorem prover ~ called Zapato. 
Zapato’s overall architecture is similar to Verifun [5]. However, rather than 
using the Simplex or Fourier & Motzkin arithmetic decision procedures, Zapato 
uses an algorithm due to Harvey & Stuckey [6] for solving constraint problems 
over sets of unit two-variable per inequality integer terms. 



^ C2bp [3] can also be configured to use the theorem prover described in this paper. 
However, by default, C2 bp actually uses a special purpose symbolic theorem prover 
called FastF which is similar to the work of Lahiri, Bryant & Cook [4] but optimized 
for speed and not precision. 
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2 Lazy Theorem Proving Based 
on Propositional Logic SAT 

Zapato takes a query in quantifier-free first-order logic (FOL) and returns valid 
if it can prove the query to be true. In the case that it cannot prove the query 
true, it returns invalid. Note that invalid does not actually mean that the 
query is not valid - only that Zapato could not prove it valid. Zapato operates 
by negating the FOL query passed to it and then trying to prove unsatisfiability. 
Zapato creates a propositional logic (PL) abstraction of the negated query in 
which each atomic formula is replaced by a fresh representative propositional 
variable. A mapping between formulae and propositional variables is maintained 
throughout the execution of the algorithm. 

An incremental PL SAT-solver is used to determine if the propositional ab- 
straction is satisfiable. In the case that it is unsatisfiable, the query is determined 
to be valid. In the case that the abstraction is satisfiable, the model found is 
converted to FOL (via a reverse application of the mapping) and passed to a 
FOL SAT-solver for conjuncts. This model is often partial - meaning that vari- 
ables which can be of either Boolean value in the model are typically eliminated 
by the SAT-solver in a post-backtracking analysis. Since these values are not 
important in the model they can be ignored when determining the satisfiability 
of the model in FOL. 

If the input FOL conjunct solver determines that the found model is satis- 
fiable, then the original query is determined to be invalid. In the case that the 
set of conjuncts is unsatisfiable, the core reason that the model is unsatisfiable 
is determined. This core reason is a subset of the original conjuncts and is found 
by visiting the leaves of the proof tree constructed by the FOL conjunct solver. 
The core reason is then converted to PL using the mapping and inserted into the 
PL SAT-solver’s database of learned clauses. A third possible output from the 
conjunct solver is a set of case-splits, which is translated into additional conflict 
causes. This case is due to the way some axioms are instantiated. The loop is 
repeated until the PL SAT-solver returns unsatisfiable or the FOL solver returns 
satisfiable. 

Zapato’s FOL conjunct solver uses Nelson & Oppen’s method of combining 
decision procedures with equality sharing. Currently we are using this tech- 
nique to combine congruence closure for uninterpreted functions with Harvey 
& Stuckey’s decision procedure. Each of the procedures must produce the rea- 
soning behind their decisions in order to allow for the construction of proofs. 
Zapato’s FOL solver also implements several axioms that express facts about 
C expressions with pointers by searching the set of input conjuncts and adding 
additional conjuncts. 

2.1 Harvey & Stuckey’s Decision Procedure 

Harvey & Stuckey’s procedure is a complete algorithm with polynomial time 
complexity that can be used to solve conjuncted integer constraints. In this 
procedure each input constraint is expected to be of the form ax + by < d 
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where x and y are variables, d is an integer and a and b must be elements of 
the set { — 1,0, 1}. Harvey & Stuckey call this subset the unit two-variable per 
inequality (UTVPI) constraints. Note that Zapato uses homogenization and 
term rewriting to remove terms that contain both arithmetic and uninterpreted 
function symbols. Zapato also rewrites arithmetic terms into canonical form. 
For example, x < y is converted into la; + (— l)y < —1. 

The algorithm is based on iterative transitive closure: as each conjunct is 
added into the system, all possible consequences of the input are computed. 
Several strengthening transition rules on the state of the set of known facts 
are applied at each conjunct introduction to produce the complete closure. In 
the case that the conjuncts are unsatisfiable, a conjunct and its negation will 
eventually exist in the set of facts. In the case that no contradiction is found, 
the constraints must be satisfiable. 

An interesting aspect to note about Harvey & Stuckey’s algorithm is that it 
can be used to prove the unsatisfiability of sets of constraints which may include 
non-UTVPI terms. For example, it can prove the unsatisfiability of 

n>k'-\-nAx<y'Av-\-5<lAy'<zA ~^{x <z)Ak<v-\-z 

In these cases we can simply prune away the conjuncts that are not UTVPI 
terms because the non-UTVPI constraints (such as k < v z) may not factor 
into the reason for unsatisfiability. The above expression is unsatisfiable because 
the following is unsatisfiable: x < y' Ay' < z A ~^{x < z). This allows us to solve 
many queries that may appear at first to be non-UTVPI. 

The logic supported by Zapato dictates that Slam’s predicates are always 
UTVPI constraints with the possibility of uninterpreted function symbols. This 
is due to the fact that Slam’s Newton module only adds new predicates in 
the case that validity is proved by the theorem prover. At first this may sound 
disastrously limiting. However, it is not. Both Newton and Constrain can 
refine the abstraction if they can find any spurious transitions in the path that 
they are examining. Paths are often spurious for multiple reasons. So long as 
at least one of those reasons can be proved valid using Zapato, the refinement 
algorithm will make progress. In the worst case, Zapato’s limited arithmetic 
support could cause Slam to report a false bug. In the 10 months that we have 
been using UTVPI in Zapato we have only seen this occur once while model 
checking real device drivers. 

The completeness of Harvey & Stuckey’s procedure allows us to prove the 
validity of queries other automatic provers struggle with. Consider, for example, 
the validity ofx<3Aa;>l=J>a; = 2. Because Verifun uses a decision 
procedure for the real numbers the only way that this can be proved is through 
the use of special purpose heuristics or with additional axioms. 

3 Experimental Results 

To evaluate the effectiveness of Zapato we used Slam to check 82 safety proper- 
ties of 28 Windows OS device drivers and recorded statistics about the theorem 
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prover queries. In total, Slam passed 1,748,580 queries to Zapato. Note that 
Slam caches the interface to the prover. These 1,748,580 queries are the cases in 
which a pre-computed result could not be found in the cache. Of these queries, 
148,509 were proved valid by Zapato. Zapato resolved the remaining queries 
as invalid. 607,251 queries caused Zapato, at least on one occasion, to invoke 
Harvey & Stuckey’s decision procedure on a set of conjuncts with a non-UTVPI 
term. Of those 607,251 queries, Zapato was still able to prove 62,163 valid. 

To compare Harvey & Stuckey’s algorithm versus a more traditional arith- 
metic decision procedure we also compared the functional result of Zapato to 
Simplify on the benchmark queries. Simplify was able to prove 248 queries 
valid that Zapato could not. Zapato proved 560 queries valid that Simplify 
could not. Because the provers support different logics, a performance compari- 
son is difficult to analyze. But generally, as also reported by Flanagan et al. [5], 
the new lazy PL SAT-based approach scales better on larger benchmarks. In the 
case of the hardest invalid query from the benchmarks, Zapato’s runtime was 
8 seconds while Simplify’s was 10 seconds. Both Simplify and Zapato were 
able to prove the hardest valid query in 3 seconds. 

4 Conclusions 

Zapato’s architecture is based on the prover Verifun. What makes Zapato 
unique is that it does not use Fourier & Motzkin or Simplex as its decision proce- 
dure for arithmetic. Instead, it uses a decision procedure proposed by Harvey & 
Stuckey which has better worst-case asymptotic complexity and a completeness 
result for a useful subset of arithmetic terms over the integers. The expressive- 
ness of this subset appears to be sufficient for the verification of Windows device 
drivers using Slam. We are currently integrating the Simplex decision procedure 
into Zapato, which we envision could be optionally combined with Harvey & 
Stuckey’s algorithm in applications where full linear arithmetic is required. 
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Abstract. JNuke is a framework for verification and model checking of Java 
programs. It is a novel combination of run-time verification, explicit-state model 
checking, and counter-example exploration. Efficiency is crucial in dynamic ver- 
ification. Therefore JNuke has been written from scratch in C, improving per- 
formance and memory usage by an order of magnitude compared to competing 
approaches and tools. 



1 Introduction 

Java is a popular object-oriented, multi-threaded programming language. Verification 
of Java programs has become increasingly important. Dynamic analysis, including run- 
time verification and model checking, has the key advantage of having precise informa- 
tion available, compared to classical approaches like theorem proving and static anal- 
ysis. There are fully automated dynamic analysis algorithms that can deduce possible 
errors by analyzing a single execution trace [1, 7, 14]. 

Dynamic analysis requires an execution environment, such as a Java Virtual Ma- 
chine (VM). However, typical Java VMs only target execution and do not offer all re- 
quired features, in particular, backtracking and full state access. Code instrumentation, 
used by JPaX, only solves the latter problem [9]. JNuke, our run-time verification and 
model-checking framework, contains a specialized VM allowing both backtracking and 
full access to its state. Custom checking algorithms can be implemented. 

Related work includes software model checkers that apply directly to programs, 
for example, the Java PathFinder system (JPF) developed by NASA [17], and similar 
systems for checking Java programs [4, 7, 12, 13] and other software [6, 8, 10, 1 1]. JPF, 
as a comparable system, is written in Java. Hence it effectively uses two layers of VMs, 
the system layer and its own. Our benchmarks show that JNuke is more efficient. 

2 System Design 

The static part of JNuke includes a class loader, transformer and type checker, including 
a byte code verifier. When loading a Java class file, JNuke transforms the byte code 
into a reduced instruction set derived from the abstract byte code in [16], after inlining 
intra-method subroutines. Additionally, registers are introduced to replace the operand 
stack. A peep-hole optimizer takes advantage of the register-based byte code. Another 
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component can capture a thread schedule and use code instrumentation to produce class 
files that execute a given schedule on an arbitrary VM or Java debugger [15]. 

At the core of JNuke is its VM, providing check-points [5] for explicit-state model 
checking and reachability analysis through backtracking. A check-point allows explo- 
ration of different successor states in the search, storing only the difference between 
states for efficiency. Both the ExitBlock and ExitBlockRW [2] heuristics are available 
for schedule generation. These algorithms reduce the number of explored schedules 
in two ways. First, thread switches are only performed when a lock is released, thus 
reducing interleavings. Second, the RW version adds a partial-order reduction if no 
data dependencies are present between two blocks in two threads. If no data races are 
present, behavioral equivalence is preserved. The supertrace algorithm [10] can be used 
to reduce memory consumption. For generic run-time verification, the engine can also 
run in simulation mode [17], where only one schedule defined by a given scheduling al- 
gorithm is executed. Event listeners can implement any run-time verification algorithm, 
including Eraser [14] and detection of high-level data races [1]. 

For portability and best possible efficiency, JNuke was implemented in C. A light- 
weight object-oriented layer has been added, allowing for a modern design without 
sacrificing speed. We believe that the choice of C as the programming language was not 
the only reason for JNuke’s competitive performance. The ease of adding and testing 
optimizations through our rigorous unit testing framework ensured quality and effi- 
ciency. The roughly 1500 unit tests make up half of the 120,000 lines of code (LOC). 
Full statement coverage results in improved robustness and portability. JNuke runs on 
Mac OS X and the 32-bit and 64-bit variants of Linux (x86 and Alpha) and Solaris. 



3 Experiments 

The following experiments were used to compare JNuke to JPaX and JPF: two task- 
parallel applications, SOR (Successive Over-Relaxation over a 2D grid), and a Trav- 
elling Salesman Problem (TSP) application [18], a large cryptography benchmark [3] 
and two implementations of well-known distributed algorithms [5]; For Dining Philoso- 
phers, the first number is the number of philosophers, the second one the number of 
rounds. Producer/Consumer involves two processes communicating the given number 
of times through a one-element buffer. The experiments emphasize the aim of applying 
a tool to test suites of real-world programs without manual abstraction or annotations. 

All experiments were run on a Pentium III with a clock frequency of 733 MHz and 
256 KB of level II cache. Table 1 shows the results of run-time verification in simulation 
mode. Memory limit was 1 GB. Benchmarks exceeding it are marked with “m.o.”. 

The columns “Eraser” refer to an implementation of the Eraser [14] algorithm. 
Mode (1) distinguishes between accesses to individual array elements and is more pre- 
cise. Mode (2) treats arrays as single objects and therefore requires much less memory 
for large arrays. Columns “VC” refer to the view consistency algorithm [1] used to 
detect high-level data races. Again, modes (1) and (2) concern array elements. 

For comparison, the run times using the JPaX [9] and JPF [17] platforms are given. 
In the JPaX figures, only the time required to run the instrumented program is given, 
constituting the major part of the total run time. This refers to both the view consistency 
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Table 1. Execution times for run-time verification in simulation mode, given in seconds. 



Application 


Sun’s VM 


JNuke 

VM 


JNuke Eraser 


JNuke VC 


JPaX 


JPF 


JIT 


no JIT 


1 


2 


1 


2 


2 


2 


SOR 


0.7 


0.7 


3.6 


934.1 


21.5 


34.0 


19.2 


45.9 


error 


TSP, size 4 


0.6 


0.4 


0.7 


1.7 


1.5 


1.2 


1.1 


2.7 


m.o. 


TSP, size 10 


0.6 


0.4 


1.8 


10.0 


9.3 


9.1 


8.4 


56.3 


m.o. 


TSP, size 15 


0.8 


1.2 


24.5 


228.7 


203.0 


207.0 


192.7 


1109.5 


m.o. 


JGFCrypt A 


6.6 


19.1 


415.0 


m.o. 


1667.7 


m.o. 


1297.7 


m.o. 


m.o. 


Dining Phil. (DP 3 5000) 


1.2 


1.9 


11.0 


15.7 


15.6 


987.0 


987.0 


83.2 


m.o. 


Prod./Cons. (PC 12000) 


1.6 


1.5 


5.6 


8.1 


8.1 


71.8 


71.8 


error 


m.o. 



Table 2. Results for model checking. Time is in seconds, no. of instructions in thousands. 





1 JNuke (EB/RW) 


1 JNuke (EB) 


1 JPF (lines) 


1 JPF (byte codes) | 




time 


#ins 


#ins/s 


time 


#ins 


#ins/s 


1 time 


#ins #ins/s 


1 time 


#ins 


#ins/s 1 


DP 2 10 


4.8 


171 


35.8 


7.9 


323 


40.7 




503 


5.9 








DP 3 1 


0.2 


3 


16.7 


1.7 


60 


35.3 


1 29.7 


151 


5.1 






2.9 


DP 3 2 


0.5 


11 


23.9 


20.8 


693 


33.4 








t.o. 


t.o. 


t.o. 


DP 3 3 


6.5 


190 


29.1 


99.1 


2,992 


30.2 


1 597.6 


3,670 


6.1 


t.o. 


t.o. 


t.o. 


PC 100 


0.9 


67 


72.8 


1.0 


80 


78.4 




279 


5.8 






3.5 


PC 1000 


9.1 


661 


72.3 


10.0 


794 


79.2 


1 409.4 


2,795 


6.8 


1 t.o. 


t.o. 


t.o. 



and Eraser algorithms, which can use the same event log. JPaX currently only offers 
mode (2). View consistency is currently not implemented in JPF [17]. In its simulation 
mode, it ran out of memory after 10-20 minutes for each benchmark. 

Generally, JNuke uses far less memory, the difference often exceeding an order of 
magnitude. Maximal memory usage was 66.7 MB in the SOR benchmark in mode (2). 
Analyzing a large number of individual array entries is currently far beyond the capacity 
of JPaX and JPF. Certain benchmarks show that the view consistency algorithm would 
benefit from a high-performance set intersection operation [14]. Because file I/O is not 
available in JPF, most benchmarks required manual abstraction for JPF. 

The focus of our experiments for explicit-state model checking capability is on per- 
formance of the underlying engines, rather than on state space exploration heuristics. 
Therefore correct, relatively small instances of Dining Philosophers and Prod./Cons. 
from [5] were chosen where the state space can be explored exhaustively. JPF supports 
two modes of atomicity in scheduling, either per instruction or per source line. JNuke 
offers both ExitBlock and ExitBlockRW (EB and EB/RW, [2]) relying mainly on lock 
releases as boundaries of atomic blocks. Hence, a direct performance comparison is 
difficult. For this reason. Tab. 2 provides instructions per second in addition to absolute 
numbers. Timeout (“t.o.”) was set to 1800 s. 

JNuke often handles 5-20 times more instructions per second than JPF. Atomicity 
in scheduling has a large impact, as shown by the number of instructions executed for 
different heuristics. In absolute terms, JNuke is usually an order of magnitude faster. 
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4 Conclusions and Future Work 

JNuke implements run-time verification and model checking, both requiring capabili- 
ties that off-the-shelf virtual machines do not offer. Custom virtual machines, however, 
should achieve a comparable performance, as JNuke does. Scheduler heuristics and 
verification rules can be changed easily. JNuke is more efficient than comparable tools. 

Future work includes a garbage collector and a just-in-time compiler. The segmen- 
tation algorithm [7] would reduce false positives. Static analysis to identify thread-safe 
fields will speed up run-time analysis. Finally, certain native methods do not yet allow 
a rollback operation, which is quite a challenge for network operations. 
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Abstract. Our aim is to validate mission-specific components of spacecraft 
flight software designs that are specified using state-charts and translated auto- 
matically to the final flight code for the mission. We established an automatic 
translation tool set from state-charts to the input language of SPIN for the valida- 
tion of such mission-specific components. To guarantee compliance with auto- 
generated flight code, our translation tool set preserves the StateFlow® seman- 
tics. We are now able to specify and validate portions of mission-critical soft- 
ware design and implementation using the exhaustive exploration techniques of 
model checking. 

Keywords: state-charts, model checking, translation, Stateflow, SPIN 



1 Introduction 

The HiVy Tool Set enables model checking for state-charts [1]. This is achieved by 
translating state-chart specifications into Promela, the input language of the Spin 
model checker [2]. The HiVy Tool Set transforms output of the commercial tool 
Stateflow® provided by The Mathworks. HiVy can also be used independently from 
Stateflow. An abstract syntax of hierarchical sequential automata (HSA) is provided 
as an intermediate format for the tool set [3]. The HiVy Tool Set programs include 
SfParse, sflhsa, hsalpr and the HSA merge facility. 

Rationale 

The authors of Stateflow adopted the graphical notation of state-charts as proposed by 
D. Harel [4] but designed a different semantics to this notation. The Statemate by 
ILogix tool supports the original semantics developed by D. Harel and there are some 
advances that extend Statemate specified designs to model checking facilities [5], [6]. 
The following partial list illustrates differences in semantic design between Stateflow 
and Statemate state-charts, which make clear that Statemate-based tools cannot be 
used for Stateflow state-chart verification. 
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• In Stateflow semantics there is at most one event active at a time. In Stalemate 
semantics any finite number of events are allowed. 

• Emitting an event in Stateflow semantics means to pass control to the receiver 
chart of the event at the moment of emitting the event. In Statemate semantics 
events are collected until the end of the step and then broadcast to the chart. 

• In Stateflow the execution order within an AND-state is determined by the graphi- 
cal placement of the AND-composed charts. In Statemate all AND-states are exe- 
cuted simultaneously. 

• In Stateflow semantics the effect of changing variables takes place immediately. 
In Statemate semantics a variable change takes effect only at the end of the step. 

We propose a new format called hierarchical sequential automata (HSA) that accu- 
rately reflects the semantics of Stateflow for implementation within the HiVy Tool 
Set. Each state-chart is associated with an equivalent hierarchical sequential automa- 
ton that consists of a finite set of cooperating sequential automata that can be imple- 
mented as parallel processes in Promela. State-chart states, events and variables are 
encoded as Promela variables and Promela processes change the values of these vari- 
ables in order to simulate state changes, event generation and variable changes ac- 
cording to the semantics of Stateflow. The observable behavior is defined with re- 
spect to the variables representing state-chart states, events and variables. These ideas 
are very close to those for translating Statemate state-charts to Promela as presented 
in [6]. 

2 Overview of the Tool Set 

Constructing a State-Chart 

A state-chart model of the system to be verified must first be constructed. Access to 
and general familiarity with the Stateflow application is needed. 

Syntactic Restrictions. In order to use the HiVy Tool Set the state-chart must be de- 
signed in a sub-set of the Stateflow language. This sub-set does not include: 

• Implicit event generation* (See Figures 1 & 2) 

• Inner transitions with the same source and destination* 

• Transition actions on transition segments that end on a junction 

• History junctions 

* Alternate implementations are supported within HiVy 




The diagram above represents the following state-chart activity. In Chart_l, State_Al is 
active. In Chart_2, State_B is or becomes active. Upon entry into State_B an EVENT is 
internally generated. This EVENT is sensed by Chart_l and causes the transition from 
State_Al to State_A2 to be made. 



Fig. 1. Implicit Event Notation Example 
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The diagram above represents an equivalent execution to that shown in Figure 1. We set 
a variable EVENT = 1 in State_B and create a conditional guard on the transition from 
State_Al to State_A2 that evaluates to TRUE when EVENT==1. This notation is suitable 
for HiVy translation. 

Fig. 2. Equivalent Implicit Event Notation 

It is significant to note that HiVy supports state-chart events generated hy an external 
environment (e.g., emitted non-deterministically from an included Promela if-loop). 

There are additional, simple rules for state-chart model development to support 
successful translation with HiVy that pertain to scoping (dictation of where non- 
graphical object types can exist in the hierarchy), support for embedded state-charts or 
sub-charts, and closed-system verification with Spin. 

Performing Model Translation 

This section describes how to parse Stateflow models to produce the HSA intermedi- 
ate format, merge sub-charted state-charts with their parent charts (if desired), and 
generate Promela input for the Spin model checker. 

Parsing & HSA. Two programs of the HiVy Tool Set: SfParse and sfZhsa are used to 
prepare the model file for translation. If parsing is successful, a file is produced that 
contains an ASCII representation of the abstract syntax tree in HSA-format. HSA is 
given by a set of events, a set of states, a root state, a set of Boolean variables bvars, a 
set of integer variables ivars, a set of states denoting the initial configuration icon/, a 
hierarchy function hi, a typing function ty, a transition map trmap and OR-state map- 
pings to their initial states initmap [3]. 

Merging State-Charts. Models consisting of several files may be merged into one 
HSA file before translating into Promela for Spin using the HiVy program hsacom- 
plete. 

File Generation for SPIN Model Checking. The following files are generated by the 
HiVy translation program hsa2pr: 

• stmodel.pr: the Promela model of the original state-chart. 

• prop_list and propositions: the name list and definitions of propositions. One 
proposition is generated for each state and event. Proposition names are suitable in 
generating linear temporal logic (LTL) properties for verification. 



3 HiVy Metrics and Optimization 

A Plain OT Telephone System (POTS) model was developed to prototype HiVy’s 
translation capability and performance. The Stateflow model contains 36 states and 7 
events. The Promela file containing 1300 LOC (including comments) is immediately 
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produced by executing the HiVy Tool Set programs. The translated model contains 
13 Promela processes. The number of Promela processes directly affects Spin verifi- 
cation performance. To minimize system compute time and memory utilization, it is 
desirable to limit the number of processes in a Spin model. A HiVy optimization 
algorithm that produces a valid translation to a model with a single Promela process is 
currently in development. Preliminary model verification metrics as reported by Spin 
for the original and optimized Promela POTS models are provided in Table 1 below. 



Table 1. POTS Example Metrics 



Parameter 


Original HiVy 


HiVy Optimization 


Reduction 


Real/user time 


3.5 seconds 


1.5 seconds 


57% 


Total actual memory use 


8.87 Mb 


4.67 Mb 


47% 



4 In Conclusion 

The full capability of the Spin model checker may be used to verify models generated 
by HiVy because they yield valid Promela code. The development of our approach 
for verification of NASA Spacecraft Fault Protection designs has been presented 
previously [7], [8]. The validity of HiVy generated models for Spin model checking 
has been prototyped. State-space explosion effects are the significant challenge to the 
translation technique. Optimization approaches are under evaluation with initial re- 
sults showing significant (> 40%) reduction in verification parameter values. 
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Abstract. ObsSlice is an optimization tool snited for the verification 
of timed automata using virtual observers. It discovers the set of mod- 
elling elements that can be safely ignored at each location of the observer 
by synthesizing behavioral dependence information among components. 
ObsSlice is fed with a network of timed automata and generates a trans- 
formed network which is equivalent to the one provided up to branching- 
time observation. Preliminary results have proven that eliminating irrel- 
evant activity mitigates state space explosion and has a positive -and 
sometimes dramatic- impact on the performance of verification tools in 
terms of time, size and counterexample length. 

1 Introduction 

A common practice in the verification of concurrent systems is to express safety 
and liveness requirements as virtual components (observers) and composed in 
parallel with system under analysis (SUA). Our tool ObsSlice, based on [1], is 
fed with a SUA and an observer specified as a network of Timed Automata (TAs) 
and statically discovers, for each observer location, a set of modelling elements 
(automata and clocks) that can be safely ignored without compromising the 
validity of TCTL formulas stated over the observer (i.e., an exact reduction 
method wrt. branching time analysis). Eliminating irrelevant activity seems to 
mitigate state space explosion and have a positive impact on the performance of 
verification tools in terms of time, size and length of counterexamples. 

ObsSlice seems to be well suited for treatment of models comprising several 
concurrent timed activities over observers that check for the presence of event 
scenarios (e.g, [2]). 

2 ObsSlice Architecture 

Figure 1 illustrates in a modular view the way ObsSlice solves this slicing prob- 
lem by combining concepts presented in [1]. Currently, the tool takes a network 

* Research partially supported by UBACyT 2004 X020, ANCyT grant BID 1201/OC- 
AR PICT 11738, Microsoft Research Embedded Innovation Excellence Award 
** Research partially supported by UADE projects ISI03B and TSI04B 
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Fig. 1. ObsSlice architecture 



of TAs compatible with Kronos [3] and OpenKronos [4] formats and an I/O 
classification of the labels in the TAs. The main goal of Relevance Calculator 
is to estimate, for each observer location, a set of components and clocks whose 
activity can be safely ignored during the observed evolution of the SUA. To 
achieve that goal, it relies on Pair Wise Influence Calculator which stati- 
cally calculates if a given component C may influence the behavior of another 
component C" when sojourning a given observer location. Currently, I/O clas- 
sification of events for each automata helps to check potential influence due to 
communication, assignments or predicates. I/O declarations are, in general, in- 
tuitively known by verification engineers or can be automatically provided by 
high-level front-end modelling languages. ObsSlice is robust in the sense that 
a wrongly specified I/O label classification would only compromise the exact- 
ness of the method but reachability results would still be conservative. On the 
other hand. Sojourn Set Calculator provides an over-approximation of the 
set of locations that may be traversed by a given component when sojourning 
an observer location. This information serves as a way to obtain a more precise 
pair-wise influence prediction. Sojourn Set Calculator, by default, performs 
an untimed composition of each of the SUA TAs with the observer. Optionally, 
the sojourn set calculus can be improved by specifying which sets of TAs should 
be composed together due to a suspected synchronous behavior among them 
(synchronous subsystem directives). In addition to SUA TAs, constraint au- 
tomata can be used to compactly express ordering of events in the composed 
system (e.g., FIFO ordering in a communication chain of components). Such 
constraints can be derived through abstraction techniques or directly proposed 
by verification engineers, in order to keep Sojourn Set Calculator from per- 
forming large compositions. 

Finally, the Automata Transformer receives the activity tables and gener- 
ates the network of transformed TAs. The enabling and disabling of modelling 
elements is achieved differently depending on the target dialect. Currently, the 
transformed models can be checked by Kronos [3], OpenKronos [4] and Up- 
PAAL [5]. The reduction is performed through the addition of sleep locations 
and their corresponding transitions (when possible, deactivation of clocks is also 
informed together with the model). 
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Table 1. Examples sizes 



1 SUA 


SUA+0 


Observer | 


Name 


#TAs 


^Clocks 


#Loc 


#Tran 


FddilO 


21 


32 


21 


221 


Piped 


13 


14 


15 


197 


Pipe7 


15 


16 


17 


227 


Remotes . 


12 


13 


29 


395 


MinePump 


8 


8 


9 


58 



3 Experiments 

Being a preprocessing tool based on an exact reduction technique, ObsSlice 
is suited for integration with virtually any verification strategy built in cur- 
rent modelcheking tools. For a discussion on related work, please refer to [1]. 
OpenKronos tool was run using DFS search order and DBMs as state space 
representation when error is known to be reachable. Uppaal was run using 
-Was option and BFS order generating the whole state space when error is not 
reachable, -Was -tl to generate the shortest trace to the error, and -Was -A 
to apply a conservative abstraction (convex hull) for some of the unreachable 
error cases. Table 1 shows the examples sizes: number of TAs of the SUA, clocks 
of SUA-bobserver and the number of locations and transitions of the observers. 
Table 2 shows times and memory consumed by the modelchecking tools over 
the original and “obssliced” models. We also provide the length of the shortest 
trace (including committed synchronization with the observer) and the time con- 
sumed to generate it. Time consumed by ObsSlice itself is not reported since 
it is negligible compared with verification times (less than a couple of seconds) . 

FddilO is an extension of the FDDI token ring protocol similar to the one 
presented in [1] where the observer monitors the time the token takes to return 
to a given station. Piped and Pipe7 are pipe-lines of sporadic processes that 
forward a signal emitted by a quasi periodic source [1] (with 6 and 7 stages 
resp.). The observer captures a scenario violating an end-to-end constraint for 
signal propagation. The rest of examples are designs of distributed real-time 
system generated using the technique presented in [6]. Observers were obtained 
using VTS [2] , a tool that automatically produces timed observer from scenario 
specifications. Remote Sensing [2], is a system consisting of a central compo- 
nent and two remote sensors. Sensors periodically sample a set of environmental 
variables and store their values in shared memory. When the central component 
needs them, it broadcasts a signal to the sensors. Each sensor runs a thread 
devoted to handle this message by reading the last stored value from shared 
memory and sending it back to the central component. The latter pairs the 
readings so that another process can use that piece of information to perform 
certain actions on some actuators. The observer captures scenarios where a re- 
quest for collecting a pair of data items is not fully answered in less than a 
given amount of time. MinePump is a design of a fault-detection mechanism for 
a distributed mine-drainage controller [7]. A watchdog task periodically checks 
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Table 2. Verification benchmarks (Time expressed in seconds, Mem in MB) 
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6.23 

8.60 




o7m 
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81 















the availability of a water level sensing device by sending a request and ex- 
tracting acknowledgements that were received and queued during the previous 
cycle (by another sporadic task). When the watchdog finds the queue empty, it 
registers a fault condition in a shared memory which is periodically read, and 
forwarded to a remote console, by a proxy task. The proposed scenario verifies 
that the failure of the high-low water sensor is always informed to the remote 
operator before a given deadline. Experiments were run on a SunBlade 2000 
with 2GB RAM and show important savings in verification times (even using 
convex hull abstraction^), memory consumed and length of counterexamples. 
A Java version of our tool together with a set of examples can be found at 
http : //www. dc . uba. ar/people/proyinv/rtar/obsslice. 

4 Past and Future 

ObsSlice evolves from a manually- integrated proof of concept tools [1]. Cur- 
rently, it features improved transference algorithms, a weakened set of influence 
rules, clock deactivation rules, and a brand new set of transformation rules that 
work for broadcast model of communication (produces transformed networks 
ready to be checked with Uppaal). Besides, it provides flexible mechanisms to 
guide sojourn set calculus. Future extensions comprise end-to-end support of 
more timed automata dialects. We also plan to extend the concept of influence 
at a finer grain of analysis (not only at the location basis) and to use time in- 
formation to make a more precise analysis of sojourn sets. On the other hand, 
we believe that our abstraction based on activity, can be cheaply performed on- 
the-fly by adapting verification engines. We believe that slicing over observers 
is an idea that may also be applied to other fields such as the verification of 
concurrent and distributed applications where asynchronicity exacerbates state 
space explosion. 
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Abstract. UCLID is a tool for term-level modeling and verification of 
infinite-state systems expressible in the logic of counter arithmetic with 
lambda expressions and uninterpreted functions (CLU). In this paper, 
we describe a key component of the tool, the decision procedure for CLU. 

Apart from validity checking, the decision procedure also provides other 
useful features such as concrete counterexample generation and proof- 
core generation. 

1 Introduction 

Decision procedures for fragments of first-order logic form the core of many 
automatic and semi-automatic verification tools. Applications include micropro- 
cessor verification (e.g., [3]) and predicate abstraction-based software verification 
(e.g. [1]). Decision procedures also find use as components of higher-order logic 
theorem provers, such as PVS [10]. 

UCLID [4,15] is a tool for modeling and verifying infinite-state systems 
expressible in a logic called CLU. The logic is a decidable fragment of first-order 
logic with restricted lambda expressions, uninterpreted functions and equality, 
counter arithmetic (i.e. addition by constants) and ordering (<). Thus, the only 
arithmetic constraints permitted in this logic are of the form Ti t<i T 2 -I- c where 
T\ and T 2 are integer expressions and txi€ {<, =}. 

One of the key components of the tool is an efficient decision procedure for 
CLU. Apart from the logic it handles, there are several distinguishing features 
of the decision procedure that set it apart from other decision procedures such 
as SVC [2], CVC [14] and ICS [6]: 

— Eager translation to SAT: The decision procedure performs a satisfiability- 
preserving translation of the first-order formula to a Boolean formula, which 
in turn is checked with a Boolean Satisfiability (SAT) solver. This is in con- 
trast to other SAT-based procedures (e.g., [5, 14]), which compute a Boolean 
abstraction of the first-order formula and lazily refine the abstraction based 
on inconsistent SAT assignments. In contrast, UCLID performs an eager 
translation. 

— Integer interpretation: Most queries in hardware and software verification 
require using an integer interpretation of symbols. However, most available 
decision procedures are not complete for integers even if one restricts oneself 
to CLU logic. UCLID, on the other hand is complete for integers, which, 
e.g., makes it extremely useful in reasoning about systems with arrays. 

* This research was supported by the Semiconductor Research Corporation, Contract 
RID 1029 and by ARC grant DAAD 19-01-1-0485. 
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— Reducing the domain of interpretation: The decision procedure exploits op- 
timizations that allow it to interpret symbols over smaller domains by an- 
alyzing formula structure. The small model property for CLU permits con- 
sidering only a finite but often small set of values for the integer symbols in 
the formula. This set is further reduced by exploiting positive equality [3,8]. 

The tool has been implemented in Moscow ML and contains around 30K lines 
of code. It can interface to both SAT solvers and BDD packages. The CLU for- 
mulas are internally represented using a directed acyclic graph (DAG) structure 
which facilitates effective sharing of common subexpressions. The DAG storage 
manager uses heuristics to detect and collapse certain semantically equivalent 
but syntactically distinct expressions. 



2 CLU Logic via Examples 

Consider an example of a valid CLU query which contains uninterpreted func- 
tions and lambda expressions for arrays where ITE stands for the if-then-else con- 
struct. Below, the first three lines define temporary names for sub-expressions, 
and the decide command is used to invoke the decision procedure. 

tl := f(a) != f(b) ; 

m’ := Lambda x. ITE(x = a, 0, m(x)) ; (* m’ <- m[a:=0] *) 

t2 := tl => (m’ (b) = m(b)) ; 

decide (t2) ; (* is t2 valid? *) 

Here is an example^ that cannot be modeled using traditional select-update 
arrays, since an arbitrary number of entries in the array m gets updated in a 
single step. 

tl := f(m(b)) = f(m(a)) ; 

m’ := Lambda x. ITE(m(x) < a+1, a, m(x)) ; 

t2 := tl => (m’ (b) = m(b)) ; 

decide (t2) ; (* is t2 valid? *) 

This is an example of an invalid formula. The tool produces a counterexample 
which looks as follows: 

+++ Counter-Examples Found : Formula Not Valid +++ 
a=23, b=32, m(23)=18, m(32)=22, f(22)=3, f(18)=3 

This is a partial interpretation to all the function symbols which are relevant 
to the counterexample. The concrete counterexamples have been found extremely 
useful for debugging and verifying non-trivial systems [9] . 

The logic also supports very limited quantifiers (at the cost of incompleteness) 
at the top-level of a formula. One can assert a universally quantified formula in 
the antecedent while deciding a CLU formula as follows: 

^ The syntax is slightly different for the actual tool. 
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decide ( (FORALL x,y. f (x) = f(y) => x = y) => f(a) != f(a+l)); 

This limited capability has been found very useful in practice, e.g., in au- 
tomating non-trivial proofs for out-of-order processor verification with unboun- 
ded resources [9]. 

3 Decision Procedure 

Operation. The decision procedure performs a series of transformations to re- 
duce a first-order formula to a Boolean formula. The quantifiers are first elim- 
inated using quantifier instantiation techniques [9]. The resulting CLU formula 
is translated to an equi-satisfiable Boolean formula using the following sequence 
of steps: (i) First, lambda expressions are removed using Beta-reduction; (ii) 
Second, function applications are replaced with symbolic constants using opti- 
mizations like exploiting positive equality; (iii) Finally, integer-valued symbolic 
constants are either instantiated over a finite domain (which is sufficient to pre- 
serve satisfiability) or atomic predicates (e.g. x < y + 3) over these symbolic 
constants are encoded using fresh Boolean variables and transitivity constraints 
are imposed [13]. The generated formula is checked using a SAT solver. Since 
the nature of encoding greatly affects the SAT solver’s performance, UCLID 
employs problem-specific hybrid encoding strategies [12] to improve the quality 
of the final encoding. 

Counterexample Generation. The assignment produced by the SAT solver 
over the Boolean variables to an assignment over the first-order symbols in- 
cluding function constants. First, assignments for the integer variables are con- 
structed, and then for each function application, the arguments and the result 
of the application are evaluated from the integer variables that represent them. 
Proof- Core Generation. Many SAT solvers generate an unsatisfiable core of 
Boolean variables. This can be used to generate a proof core for the original 
CLU formula. These variables can be mapped back to atomic predicates in CLU 
logic, since the mappings generated by the translation to SAT are preserved. 
The atomic predicates find use in, for instance, predicate discovery for predicate 
abstraction-based verifiers. 

Benchmarking. We have benchmarked the decision procedure on a diverse set 
of verification benchmarks arising in verifying high-level microprocessor designs, 
cache coherence protocols, model checking software device drivers, and compiler 
validation. UCLID outperforms other decision procedures including SVC and 
CVC on these benchmarks; results may be found in a recent paper [12]. 
Extensions. The decision procedure code has also been used for performing 
symbolic predicate abstraction [7]. Ongoing work includes extending UCLID’s 
logic to include quantifier-free Presburger arithmetic [11]. 
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Introduction 

The specification formalism employed in model checking is usually some flavour 
of temporal or process algebraic language that expresses properties of the be- 
havioural aspects of a system. Knowledge [5] is a modality that is orthogonal 
to the behavioural dimension, capturing properties of information flow. Logics 
of knowledge have been shown to be a useful framework for the analysis of dis- 
tributed algorithms and security protocols, and model checking of these logics 
was first mooted by Halpern and Vardi [6] . Since that time theoretical aspects of 
model checking the logic of knowledge and its combinations with temporal logic 
have been studied [8-10]. The system MCK introduced in this paper implements 
parts of this theory. 

The Model Checking Scenario 

The typical scenario that can be analysed using the system consists of some num- 
ber of agents (which might be players in a game, actors in an economic setting, 
or processes, programs or components in a computational setting) interacting in 
the context of an environment. The agents have the capacity to perform certain 
actions in this environment, which they choose according to their individual pro- 
tocols, or sets of rules. The agents have incomplete information about the state 
of the system due to the fact that they are able to observe only part of the state 
at each instant of time. 

The MCK system can be used to analyse this type of setting by the use 
of model checking techniques. The input to the MCK system describes: (1) the 
environment in which the agents operate, including a formal description of agent 
names, states, initial states, actions and how they affect states, and fairness 
conditions; (2) the protocol for each of the named agents, and a description 
of what parts of the state can be observed by which agents; (3) a number of 

* Work of the first author conducted while employed at UNSW. Work funded by 
an Australian Research Council Discovery grant. National ICT Australia is funded 
through the Australian Government’s Backing Australia’s Ability initiative, in part 
through the Australian Research Council. 
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specification formulas to be model checked, expressing how the agents’ knowledge 
evolves over time. Both the possible state changes selected by the environment 
and the agents’ choices of action may be non-deterministic. 

The MCK system supports several different types of temporal and epistemic 
specifications. In the epistemic dimension, agents may use their observations in 
a variety of ways to determine what they know. In the observational interpre- 
tation of knowledge, agents make inferences about the actual state based just 
on their current observation. In the clock interpretation of knowledge, agents 
compute knowledge using both their current observation and the current global 
clock value. Even more information can be extracted by the agent if it uses a 
complete record of all its observations to date to determine what it knows - 
this is called the synchronous perfect recall interpretation of knowledge. In the 
temporal dimension, the specification formulas may use either linear time tempo- 
ral logic, LTL, or the branching time logic CTL. The system supports different 
combinations of these parameters to different degrees, which in some cases is 
because the implementation remains to be undertaken and in others is due to 
inherent computational difficulties. In the case of the observational semantics, 
for example, the system supports arbitrarily nested combinations of either CTL 
or LTL operators with operators for knowledge and common knowledge, a type 
of fixpoint of the knowledge of many agents. In contrast, the perfect recall se- 
mantics only supports a linear or branching “next time” operator, although the 
theory for the full combination of knowledge with LTL has been developed [9] . 

The custom modelling language used in MCK is designed to cater for some 
specific issues arising from the semantics of knowledge and to allow maximum 
modelling flexibility; the intention is to support experimentation in this compar- 
atively unexplored area rather than verification for a specific language. 



Application Example 

Figure 1 presents an example input file to the system which models a scenario 
where the single agent, a robot called Robot (running the protocol "robot") 
operates in an environment of 8 possible positions. The environment provides 
noisy readings of the position to the robot via the sensor variable, which is 
declared to be an observable input to the agent. The robot’s goal is to halt in 
the region {2.. 4}, which its protocol attempts to achieve by halting when the 
sensor value is > 3. We would like to verify that this is the best the robot 
can do given its observations, which is to say that the condition sensor > 3 
characterises all situations in which the agent knows that it is in the goal region 
when it is running this particular protocol. 

The transitions section represents the effects of the robot’s actions (Halt 
and skip) on the state, by means of a program using non-deterministic if state- 
ments. This program is considered to run atomically in a single tick of the clock. 

The example contains two specification formulas. The construct 
spec_obs_ltl indicates that the formula uses linear time temporal logic op- 
erators and that the knowledge operator Knows is to be interpreted using the 
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type Pos = {0 . . 7} 
position : Pos 
sensor : Pos 
halted : Bool 

init_cond = position == 0 /\ sensor == 0 /\ neg halted 

agent Robot "robot" ( sensor ) 

transitions 

begin 

if Robot. Halt -> halted := True fi; 
if neg halted -> position := position + 1 fi; 
if True -> sensor := position - 1 
[] True -> sensor := position 
[] True -> sensor := position + 1 
fi 
end 

spec_obs_ltl = G (sensor >= 3 <=> Knows Robot position in {2.. 4}-) 
spec_spr_xn = X 2 (sensor >= 3 <=> Knows Robot position in {2.. 4}) 

protocol "robot" (sensor : observable Pos) 
begin 

do neg (sensor >= 3) -> skip 
[] break -> <<Halt>> 

od 
end 



Fig. 1. A Sample MCK input file. 

observational semantics. The operator G expresses truth at all future times. The 
model checker verifies that this formula is indeed true. The use of spec_spr_xn 
in the second specification indicates that the knowledge modality should be in- 
terpreted using synchronous perfect recall, and that the formula has the form 
where (j) is atemporal, expressing that (j) holds in precisely n steps after 
an initial state (n = 2 in this example). The model checker verifies that this 
formula is false. Intuitively, the determinacy of the robot’s motion allows it to 
derive its location from the number of observations made. This means that the 
test sensor >= 3 is not complete — the right-to-left implication fails. If we alter 
the transitions part to allow position to either increment or remain static 
and add a fairness condition to ensure progress, this formula becomes true, but 
is still false for n = 3. 

Implementation Sketch 

The system constructs a BDD representation of a data structure that encodes the 
knowledge set of an agent, i.e. the set of states that are consistent with its local 
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state. For the observational and clock semantics this structure is simply a set 
of states, but the synchronous perfect recall semantics requires a function that 
maps a fixed, finite sequence of observations to the set of final states of traces 
consistent with this sequence of observations, as detailed in [10]. The system is 
implemented in Haskell, and relies on David Long’s HDD package, implemented 
in C, for efficient HDD operations. 



Related Work 

While other systems in recent years have implemented model checking for the 
logic of knowledge, MCK is unique in its support of a range of knowledge se- 
mantics and choices of temporal language. Additionally, fairness constraints and 
the common knowledge operator have not been treated in prior work, and the 
clock semantics has not previously been implemented. 

The theoretical basis for MCK is largely described in [8-10], which employs 
LTL as the temporal language. Theory for model checking a combination of the 
perfect recall semantics and CTL is discussed in [2, 11]. 

Wooldridge and van der Hoek [7] use a connection between knowledge and the 
notion of local proposition, introduced in [3] , to reduce model checking knowledge 
with respect to the observational semantics to temporal model checking in SPIN. 
This reduction works for positive occurrences of knowledge operators but leads 
to an explosion in the number of temporal formulas that need to be checked 
when there are negative occurrences. MCK handles negative occurrences directly 
without incurring such an explosion. 

Our work is motivated more by issues of information flow in distributed 
systems than by distributed artificial intelligence concerns. Other work in the 
Distributed AI literature on model checking multi-agent systems, such as [1, 
4], use epistemic modalities such as “belief” that are not given an information 
theoretic semantics as in Halpern and Moses [5]. The same remark applies to 
related work in the literature on cryptographic protocol verification. 

Conclusion 

MCK can be downloaded at http://www.cse.unsw.edu.au/~mck. The system 
is under active development and other instances of the known algorithms for 
model checking knowledge and time will be added in due course. 
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1 Introduction 

The ZING project is an effort to build a ffexible and scalable model checking in- 
frastructure for concurrent software. The project is divided into four components: 

(1) a modeling language for expressing concurrent models of software systems, 

(2) a compiler for translating a zing model into an executable representation of 
its transition relation, (3) a model checker for exploring the state space of the 
ZING model, and (4) model generators that automatically extract zing models 
from programs written in common programming languages. 

The goal is to preserve as much of the control-structure of the source program 
as possible, zing’s model checker exploits the structure of the source program, 
which is preserved in the model, to optimize systematic state space exploration. 
We believe that such an infrastructure is useful for finding bugs in software 
at various levels: high-level protocol descriptions, work-flow specifications, web 
services, device drivers, and protocols in the core of the operating system. 

2 Architecture 

We believe that the following features capture the essence of modern concurrent 
object oriented languages, from the point of building sound abstractions for 
model checking: (1) procedure calls with a call-stack, (2) objects with dynamic 
allocation, and (3) processes with dynamic creation, using both shared memory 
and message passing for communication. We designed zing’s modeling language 
to have exactly these features, zing supports a basic asynchronous interleaving 
model of concurrency with both shared memory and message passing. In addition 
to sequential flow, branching and iteration, zing supports function calls and 
exception handling. New threads are created via asynchronous function calls. An 
asynchronous call returns to the caller immediately, and the callee runs as a fresh 
process in parallel with the caller. Primitive and reference types, and an object 
model similar to C# or Java is supported, although inheritance is not supported. 
ZING also provides features to support abstraction and efficient state exploration. 
Any sequence of statements (with some restrictions) can be bracketed as atomic. 
This is essentially a directive to the model checker to not consider interleavings 
with other threads while any given thread executes an atomic sequence. Sets are 
supported, to represent collections where the ordering of objects is not important 
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(thus reducing the number of potentially distinct states zing needs to explore). 
A choose construct is provided that can be used to non-deterministically pick 
an element out of a finite set of integers, enumeration values, array members or 
set elements. An example zing model that we extracted from a device driver, 
and details of an error trace that the tool found in the model can be found in 
our technical report [1] . 

Building a scalable model checker for the zing modeling language is a huge 
challenge since the states of a zing model have complicated features such as 
processes, heap and stack. We designed a lower-level model called zing object 
model (or zom), and built a ziNG compiler to convert a zing model to ZOM. 
The compiler provides a clear separation between the expressive semantics of 
the modeling language, and a simple view of zom as labeled transition systems. 
This separation has allowed us to decouple the design of efficient model checking 
algorithms from the complexity of supporting rich constructs in the modeling 
language. 

3 Model Checker 

The zing model checker executes the ZOM to explore the state space of the 
corresponding zing model. Starting from the initial state, the model checker 
systematically explores reachable states in a depth- first manner. The biggest 
technical challenge, as with any model checker, is scalability. We have imple- 
mented several techniques that reduce the time and space required for state 
exploration. 

Efficient State Representation. We observe that most state transitions mod- 
ify only a small portion of the zing state. By only recording the difference be- 
tween transitions, we greatly cut down the space and time required to maintain 
the depth-first search stack. To further cut down on the space requirements, the 
model checker stores only a fingerprint of an explored state in its hash table. We 
use Rabin’s finger-printing algorithm [3] to compute fingerprints efficiently. 
Symmetry Reduction. A zing state comprises the thread stacks, the global 
variables, and a heap of dynamically allocated objects. Two states are equivalent 
if the contents of the thread stacks and global variables are identical and the 
heaps are isomorphic. When the model checker discovers a new state, it first 
constructs a canonical representation of the state by traversing the heap in a 
deterministic order. It then stores a fingerprint of this canonical representation 
in the hash table. 

Partial-Order Reduction. We have implemented a state-reduction algorithm 
that has the potential to reduce the number of explored states exponentially. 
This algorithm is based on Lipton’s theory of reduction [12]. Our algorithm is 
based on the insight that in well-synchronized programs, any computation of 
a thread can be viewed as a sequence of transactions, each of which appears 
to execute atomically to other threads. During state exploration, it is sufficient 
to schedule threads only at transaction boundaries. If programmers follow the 
discipline of protecting each shared variable with a lock, then these transactions 
can be inferred automatically [6] . These inferred transactions reduce the number 
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of interleavings to be explored, and thereby greatly alleviate the problem of state 
explosion. 

Summarization. The ability to summarize procedures is fundamental to build- 
ing scalable interprocedural analyses. For sequential programs, procedure sum- 
marization is well-understood and used routinely in a variety of compiler opti- 
mizations and software defect-detection tools. This is not the case for concurrent 
programs, zing has an implementation of a novel model checking algorithm for 
concurrent programs that uses procedure summarization as an essential com- 
ponent [13]. Our method for procedure summarization is based on the insight 
about transactions mentioned earlier. We summarize within each transaction; 
the summary of a procedure comprises the summaries of all transactions within 
the procedure. The procedure summaries computed by our algorithm allow reuse 
of analysis results across different call sites in a concurrent program, a benefit 
that has hitherto been available only to sequential programs. 

Compositional Reasoning. Stuck-freedom is an important property of dis- 
tributed message-passing applications [14,7]. This property formalizes the re- 
quirement that a process in a communicating system should not wait indefinitely 
for a message that is never sent, or send a message that is never received. To en- 
able compositional verification of stuck-freedom, we have defined a conformance 
relation < on processes with the following substitutability property: If / < S' and 
P is any environment such that the parallel composition P j S is stuck-free, then 
P j / is stuck-free as well. Substitutability enables a component’s specification 
to be used instead of the component in invocation contexts, and hence enables 
model checking to scale. We have implemented a conformance checker on top of 
ZOM to verify the relation I < S, where I and S are ziNG models. 

4 Related Work 

Model checking is an active research area [4]. The spin project [9] pioneered 
explicit-state model checking of concurrent processes. The spin checker ana- 
lyzes protocol-descriptions written in the promela language. Though promela 
supports dynamic process creation, it is difficult to encode concurrent software 
in PROMELA due to absence of procedure calls and objects. Efforts have been 
made to “abstract” C code into promela [10] to successfully find several bugs 
in real-life telephone switching systems, though no guarantees were given as to 
whether the generated promela model is a sound abstraction of the C code. 
Over the past few years, there has been interest in using SPiN-like techniques 
to model check software written in common programming languages, dspin was 
an effort to extend spin with “dynamic” software-like constructs [11]. Model 
checkers have also been written to check Java programs either directly [18, 17, 
15] or by constructing slices or other abstractions [5]. Unlike zing, none of these 
approaches exploit program abstractions such as processes and procedure calls 
to do modular model checking, zing supports a notion of conformance between 
two zing models [7]. This feature of zing is related to, but distinct from, the re- 
finement checking feature of the fdr model checker [16, 8]. The SLAM project [2] 
has similar goals to zing in that it works by extracting sound models from C 
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programs, and checking the models. SLAM has been very successful in checking 
control-dominated properties of device drivers written in C. Unlike zing, it does 
not handle concurrent programs, and it is unable to prove interesting properties 
on heap-intensive programs. 
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Abstract. We present in this article the features of the model-checker 
we have developed: Mec 5. This tool makes it possible to handle mod- 
els written in the AltaRica language and is characterized by the great 
expressiveness of its specification logic: /r-calculus on relations with first 
order quantifiers and equality. 

Keywords: AltaRica, BDDs, model-checking, /r-calculus 

1 Introduction 

Mec 5 [10] is a model-checker for finite AltaRica [9, 2] models. The way to specify 
properties in Mec 5 consists in defining relations in a specification language we 
describe in section 2. In this setting, one can verify that a system satisfies a 
given property by testing that the set of states which violate the property is 
empty. This specification language is very expressive and allows the definition 
of complex relations between different models, e.g. bisimulation. Mec 5 is also 
open to more general problems like controller synthesis because it can compute 
winning strategies in parity games. 

AltaRica is a rich formalism developed at the LaBRI jointly with industrial 
partners. Its goal is to provide a formalism with clear semantics, supported by 
a language offering powerful modeling facilities, itself supported by many tools 
in order to perform safety analysis of a given AltaRica model using several dif- 
ferent techniques. Compilers exist to produce fault trees, Markov chains, Lustre 
programs, transitions systems. 

The need for an AltaRica model-checker was real because the availability 
of commercial tools to model in AltaRica brings libraries of components whose 
correctness must be verified. 

2 Specification Language 

The specification language used by Mec 5 coincides exactly with Park’s /r- 
calculus [8] which is first order logic extended with fixpoints over relations. 

First-Order Logic 

The propositions are built using the traditional boolean connectives (~ for nega- 
tion, & for conjunction, I for disjunction). Variables range over finite domains 
like booleans (bool), finite intervals of integers ([0, 10]) or enumerations of 
constants ({on, off}). 

* Currently a post-doc at the University of Warsaw in the European RTN “GAMES” . 
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A relation can be used in this context as a predicate, i.e. like the characteristic 
function of the relation it represents, returning a boolean value (R(x, 2)). 

The introduction of first-order quantifiers in our language allows us to dispose 
of the existential and universal modalities “/or all successors. . and “there 
exists a successor such that. . . ” whose semantics are usually given explicitly and 
which are usually the only link between the specification language and the model 
under study. We use a concrete syntax which reminds those modalities: 3x.p is 
written <x>p and Vx.p is written [x] p. 

Fixpoints over Relations 

Given a monotonic function over relations, it is possible to compute the least (+=) 
or the greatest (-=) relation which is a fixpoint of this function. For example, 
computing the transitive closure T of a relation R can be written verbatim like 
this in Mec 5 using a least fixpoint: 

T(x, y) += R(x, y) I <z>(R(x, z) & T(z, y)); 

Properties depending on several models are easy to express. Assuming an 
equivalence relation eq(a,b) on labels, bisimulation can be written like this: 
bisim(s,s’) -= 

( [e] [t] (R(s , e , t)=Xe ’ ><t ’>(R’ (s’ ,e’ ,t’ )&eq(e , e ’ )&bisim(t , t ’))))& 
([s’] [t ’] (R’ (s ’ , e ’ , t ’ )=><eXt> (R(s , e , t)&eq(e , e ’ )&bisim(t , t ’)))); 
It is also possible to use several interdependant fixpoint definitions by means 
of systems of equations [6], which gives Mec 5 the full power of /i-calculus. 

Implementation 

Mec 5 uses Binary Decision Diagrams [3] to represent relations. This choice fits 
exactly our specification language, because boolean as well as first order opera- 
tions are efficient on BDDs. The test for equality can be done in constant time, 
which is valuable when computing fixpoints. Expressions which range over finite 
domains are handled with vectors of BDDs. As a side note, our implementation 
benefits currently from a few performance improvements: BDDs use negative 
edges so that negation can be computed in constant time, and the boolean vari- 
ables used to encode a relation are interleaved and we are working on wiser 
heuristics to reduce the memory footprint. 

3 Integration with AltaRica 

The AltaRica formalism is based on constraint automata and provides facilities 
to allow the modeling of complex systems. An AltaRica node can be defined 
hierarchically, by first modeling small parts of the system (themselves nodes) 
and then gathering them as sub-nodes. The default semantics for sub-nodes is 
to run asynchronously, unless some events are explicitly synchronized. Indeed, 
two means of communication are provided: 

Memory Sharing. The assertion of a node can relate local variables to vari- 
ables of its sub-nodes and can be seen as an invariant of the node. By forcing 
two variables to be equal, it is easy to share information between nodes; more 
complex constraints can be used. (A.fl = B.f) & (A.f2 < C.f) 
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Events Synchronization. Synchronization vectors <A . a , B . b> specify which 
events should be bundled together, preventing them from occurring indepen- 
dently, and allowing a transition to fire all the specified events in parallel. 

Other mechanisms provided in AltaRica include the ability to specify priority 
constraints on events a<b as a partial order. Broadcast vectors <a,b?> extend 
synchronization vectors by allowing certain events (those with a ? mark) not to 
occur while maximizing the number of fired events: <a,b?> = {<a>} < {<a,b>}. 

The tight coupling between the AltaRica description language and Mec’s 
specification language was one of our primary goals due to our need for an 
efficient model-checker in the AltaRica community. This was done by using the 
same basic types in the two languages (booleans, intervals and enumerations), 
and using the same concrete syntax wherever possible for the definition of new 
domains, for expressions and for type expressions. It makes Mec 5 a coherent 
tool. 

Every AltaRica model A loaded in Mec 5 defines two types and two relations : 
the type of its configurations A ! c, the type of its event vectors A ! ev, its transition 
relation A ! tCA ! cx A ! evxA ! c and its set of initial configurations A!initCA!c 
which is the set of all configurations by default. Given these new objects, any 
property on A can be expressed in Mec 5’s specification language. 



4 Example 

We give a very simple AltaRica model which is a loop from which it is possible to 
escape non-deterministically; we define the product of two such loops and then 
compute the set of states from which it is unavoidable to go to a dead state: 



NotEpsilonCe : mainlev) := ~(e.Sl. = "" & e.S2. = 

UnavDead(c) += 

[e] [c ’ ] ( (main ! t (c , e, c’) & NotEpsilon(e) ) => UnavDead(c ’ ) ) ; 



node LoopExit 

state s : [0, 2] ; 
event a, b; 



trans s = 0 


1 - a 


-> 


s 


:= 1; 


s = 1 


1- b 


-> 


s 


:= 0; 


s = 1 


1- b 


-> 


s 


:= 2; 



edon 

node main 

sub SI, S2 : LoopExit; 
sync <Sl.b, S2.b>; 

edon 

AltaRica description 




({Sl.s = 2, S2.S = 2» 
({Sl.s = 1, S2.S = 2» 
({Sl.s = 0, S2.S = 2» 
({Sl.s = 2, S2.S = 1» 
({Sl.s = 2, S2.S = 0» 

Result 
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5 Conclusion 

Mec 5 currently uses common techniques like Binary Decision Diagrams, and 
in that sense is very similar to tools like SMV [7] or NuSMV [4]. However, it 
departs from these model-checkers because it provides a more powerful logic that 
is not specifically designed for model-checking. In this respect, Mec 5 is closely 
related to Toupie [5] which implements Park’s /i-calculus with decision diagrams 
but does not provide the means to load a model. 

The experiments we made show that the specification language, although 
unusual at first glance, is extremely versatile and expressing properties with it 
is quite easy. Mec 5 and examples of AltaRica models are available from the 
following URL: http://altarica.labri.fr/ 

The ongoing evolution of Mec 5 follows two lines: performance improvement 
and implementation of satellite facilities to help in the verification process (either 
in the core of the tool, e.g. manipulating traces of executions, or in a graphical 
interface, e.g. a simulator). We expect to improve performance by using the 
knowledge we can extract from an AltaRica model to choose a good variable 
ordering for the BDDs, and we are investigating more symbolic methods which 
would delay the computation of BDDs, in the spirit of what was done with 
Boolean Expression Diagrams [1] . 
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Abstract. We introduce an integrated tool for implementing and play- 
ing various diagnostic games. The tool uses a semantics hierarchy intro- 
duced in [6] to improve code sharing among various diagnostic games 
and reduce the cost of introducing a new game. PlayGame synthesizes 
the winning strategy using the evidence that is an abstract and uniform 
encoding of the proof computed by a checker, and hence instead of re- 
lying on any particular checker the tool works on a variety of checkers 
that can be extended to produce such evidence. PlayGame implements 
a /i-calculus game and a full range of equivalence/preorder games on the 
Goncurrency Workbench-New Gentury (GWB-NG). 



1 Introduction 

Games have been used in the verification community to model verification prob- 
lems, to seek better solutions, and to understand verification results. The early 
work by Stirling [5] on bisimulation games and ^-calculus games unveils the 
potential of such games as diagnostic routines. In a diagnostic game the user 
competes with the computer to show that the verification result is incorrect. By 
losing each and every play to the computer the user shall be then convinced of 
the correctness of the verification result. A diagnostic game can provide valuable 
diagnostic information in an interactive way that a traditional diagnostic routine 
such as counterexample mechanism cannot. Individual efforts have been made 
to implement certain types of diagnostic games. The recent release of Edinburgh 
Concurrency Workbench [1] includes the support for a /r-calculus game and a 
strong bisimulation game. The verification tool Truth [3] also implements a /r- 
calculus game. These tools are designed to use some specific checkers, mostly 
game-based checkers, to build the winning strategy for the computer. It is left 
to see how diagnostic games can be built on top of other existing checkers. An- 
other problem in diagnostic games is that with so many verification semantics, 
each of which requires different rules for game, defining and implementing them 
separately is a daunting task. We introduce PlayGame, a tool that provides a 
consistent interface for implementing diagnostic games and incorporating check- 
ers. Figure 1 shows the architecture of the tool. 

* Research supported by NSF grants GCR-9988489 and GGR-0098037, Army Research 
Office grants DAAD190110003, DAAD190110019, and DAAD190110473. 
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Fig. 1. The architecture of PlayGame 



The tool is designed to use the evidence that is an abstract and uniform 
encoding of the proof constructed by a checker during verification. Its precise 
definition is given in our previous work [7, 8] . Instead of relying on a particular 
checker, the tool works on a variety of existing checkers that can be extended 
to produce such evidence, as discussed in [7,8]. To support different verification 
semantics, we introduce a semantics hierarchy [6] that abstracts game rules to 
different layers. To introduce a new game, one only needs to provide the semantic 
layers unique to the game. PlayGame implements a ^-calculus model-checking 
game and eight different equivalence/preorder games including strong bisim- 
ulation/simulation games, weak bisimulation/simulation games, trace equiva- 
lence/preorder games, and testing equivalence/preorder games that cover all the 
equivalence/preorder semantics supported by Concurrency Workbench - New 
Century (CWB-NC)[2j. PlayGame also provides a consistent user interface for 
all the games that reduces the time required to learn a new game. 

2 PlayGame 

2.1 Designing PlayGame 

The design of PlayGame reflects the game semantics hierarchy defined in [6]. 
The abstract game module implements an abstract version of games and the 
features common to all games. The game semantics module defines the rules for 
each individual game. The strategy module synthesizes winning strategies from 
the evidences submitted by checkers. 

Abstract Game Module. A typical verification game has two players: player 
I, who insists a negative verification result, and player II, who believes otherwise. 
Each game has its own rules that must be determined in the following sense: if 
the correct answer to the verification problem is negative, then I has a strategy 
to win each and every play no matter how II moves; otherwise, II shall have a 
winning strategy. When a game is used as a diagnostic routine, it involves two 
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sides: the computer vs. the user. The computer assumes the role of a player in 
favor of the verification result. Thus, by losing each and every play to the com- 
puter, the user is convinced of the verification result. The abstract game module 
implements the aforementioned abstract version of games. It also introduces the 
role of a referee that enforces the rules supplied by the game semantics module. 
It also implements the common functions including the bookkeeping and the 
user interface. 

Game Semantics Module. Game semantics module defines the rules for each 
game. For the games studied before such as /r-calculus games and strong bisim- 
ulation games [5], our definition is close to previous results but also takes into 
account the human factor. Our revision intends to keep plays concise and infor- 
mative. For example, the definition of a ^-calculus game by Stirling [5] requires 
two steps and the participant of a player to unroll a fixpoint expression or 

vz.^), while the choice of the player unrolling the expression is really irrelevant. 
In the revised rules, it takes only one step and becomes part of the referee's job. 
For those games that to the best of our knowledge have not been defined in lit- 
erature such as testing equivalence/preorder games, we define the games based 
on the target verification semantics. To further improve code sharing we exploit 
similarity among each category of games. For instance, in [6] we introduce a 
generalized equivalence game submodule and semantics plug-ins. To introduce 
an equivalence game, one only needs to supply a relatively small plug-in that 
specifies part of rules unique to this game. In our experience abstracting rules to 
different layers saves about 70-80% coding work when introducing a new game. 

Strategy Module. Synthesizing a winning strategy for the computer is the 
key to diagnostic games. Traditionally winning strategies are constructed by a 
game-based checker [4]. In PlayGame the strategy is constructed from the evi- 
dence supplied by a checker. In [7, 8] we propose uniformly-encoded evidences for 
various verification semantics: for equivalence checking the evidence is a partition 
refinement tree; for preorder checking it is a kernel- auxiliary partition refinement 
tree [7], a variant of partition refinement tree in which each node contains a set of 
upper states {auxiliary set) in addition to a set of lower (and equivalent) states 
{kernel set); for model checking the evidence is a support set [8]. It turns out 
that winning strategies for the games of the same category are quite similar. For 
equivalence games, the strategy is to keep the states of two processes in different 
leaves of the partition refinement tree, and hence the algorithm for synthesizing 
the winning strategy from the evidence is implemented per category basis. 



2.2 Using PlayGame 



PlayGame is implemented on the GWB-NG. To activate a game, the user simply 
issues a verification command with a special flag. The GWB-NG with PlayGame 
calls a checker and enters the interactive game mode after the verification ends. 
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A play starts with the ref- 
eree declaring the roles of the 
computer and the user based 
on the verification result, then 
it proceeds by rounds. Figure 

2 shows a sample round in a 
weak bisimulation game. The 
look and feel of other games 
are quite similar. The referee 
judges the winner with an ex- 
planation. If no one wins yet, 
he decides how a round shall 
proceed. The user is prompted 
for his/her options of the next 
move. The user can also choose 
to take back a few steps. 

3 Conclusions and Future Work 

We introduce PlayGame, an integrated platform for diagnostic games. Two novel 
features in its design are the use of semantics hierarchy, which enables code shar- 
ing among different games, and the building of winning strategy using checker- 
independent evidences, which makes it easier to incorporate new checkers. It 
implements a ^-calculus game and the full range of equivalence/preorder games. 
In future we want to study and implement the diagnostic game for other logics. 
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Starting configuration: 

Agentl: ’out_easy.Strongjobber I ’out_easy.Strongjobber 
Agent2: (Start _easy I Start_easy I Hammer I Maillet) 

\ -Cgethjputhjgetmjputm} 

Referee: User goes first to choose an agent and make a transition. 
Which agent do you choose (1/2)?: 1 
Available user options: 

Option 0: — <^out_easy> — >^out_easy.Strongjobber I Strongjobber 
Option 1: — <^out_easy> — >Strongjobber I ^out_ easy. Strongjobber 
Which transition do you choose?[Type (0-1), or (r)eview options]:! 
Step 1: User chose agent 1 and made transition: 

— <’out_easy> — >Strongjobber I ’out_easy. Strongjobber 
Step 2: Con^ter matchs user’s choice by choosing 
the transition for agent 2: 

— «’out_easy» — >(Start_easy I Jobber 
I Hammer I Mallet)\ -Cgeth,puth,getm,putm3- 
Continue game?[(c)ontinue, e(x)it or (b)ack]:c 

Fig. 2. A sample round 
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1 Introduction 

SAL (see http : //sal . csl . sri . com) is an open suite of tools for analysis of state 
machines; it constitutes part of our vision for a Symbolic Analysis Laboratory 
that will eventually encompass SAL, the PVS verification system, the ICS deci- 
sion procedures, and other tools developed in our group and elsewhere. 

SAL provides a language similar to that of PVS, but specialized for the 
specification of state machines; it was first released with an explicit-state model 
checker as SAL 1 in July 2002; SAL 2, which was released in December 2003, 
adds high-performance symbolic and bounded model checkers, and novel infinite 
hounded and witness model checkers. Both the bounded model checkers can addi- 
tionally perform verification by fc-induction, and the capabilities of all the model 
checkers and their components are available through an API that is scriptable 
in Scheme. 



2 The Language 

The SAL language was originally conceived as an intermediate language and was 
developed in collaboration with the research groups of David Dill at Stanford 
and Tom Henzinger at UC Berkeley. Since then, our version of the language 
has evolved, principally through the addition of a richer type system, including 
structured types and subtypes so that, in addition to its role as an intermediate 
language, SAL is now a comprehensive specification language in its own right. 

SAL’s type system and expression language are similar to those of PVS, 
including higher types, predicate subtypes, datatypes, infinite types such as re- 
als and integers (and their function types), recursive function definitions, and 
quantification. State machines are specified as parameterized modules with state 
variables explicitly identified as input, output, local, or global. The transition 
relation of a module may be specified using both guarded commands and SMV- 
style variable- wise invariants. Primes are used to indicate the values of variables 

* This work was partially supported by the DARPA and USAF Rome Laboratory 
contract F33615-00-C-3043, NASA Langley Research Center contract NASl-00079, 
National Science Foundation grant CCR-ITR-0326540, and by SRI International. 
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in the new state and may appear in guards and in the right-hand sides of assign- 
ments and nondeterministic selections as well as on their left-hand sides. Modules 
may be composed both synchronously and asynchronously (and in combinations 
of these) to yield systems; a renaming construction allows inputs and outputs of 
different modules to be “wired up” appropriately. 

The assertion language is not primitive in SAL but is defined in libraries 
associated with the analyzer concerned. Three of the model checkers that con- 
stitute the analyzers in SAL 2 provide LTL as their assertion language, while 
the witness model checker supports CTL. (Both notations can be used to specify 
formulas in their common subset and SAL translates automatically to the form 
required by the analyzer concerned). 

To support its role as an intermediate language, SAL is defined in XML. 
Parsers and prettyprinters are provided for a human-readable ASCII representa- 
tion, and for a Lisp-like LSAL syntax that is useful in scripting and is translated 
directly into internal representations by the Scheme scripting interface. Because 
the language is so rich, it is easy to translate most other state machine languages 
into SAL. We have a translator from the Stateflow notation of Matlab/Simulink, 
and we expect that ourselves and others will soon provide translators from other 
popular languages. 

3 Preprocessing and Compilation 

Because SAL is a rich language, compiling it into the representations used in 
the deductive cores of its analysis tools (e.g., as BDDs, or as propositional or 
ICS SAT problems) is a substantial task. All the SAL analysis tools share a 
common set of preprocessing and compilation routines that perform extensive 
optimizations. These include partial evaluation, common subexpression elimi- 
nation, and slicing (i.e., cone of influence reduction). For the finite-state model 
checkers, arithmetic values and operators are compiled into bitvectors and binary 
“circuits” respectively, with comparable representations for other SAL types. Re- 
verse translations allow counterexamples to be presented to the user as traces 
through the original SAL specification with variable assignments expressed in 
their original SAL types. LTL assertions are translated to optimized Biichi au- 
tomata. Many transformations and optimizations can be controlled by the user. 

SAL 2 provides a lightweight typechecker, called the SAL well-formedness 
checker, that operates like the typechecker of a programming language: it checks 
that functions and operators are applied to arguments of the correct types, but 
does not perform the deeper checks needed for some of SAL’s richer constructs: 
these require proof obligations similar to TCCs in PVS (although SAL TCCs 
within modules need merely be invariants, not universally valid as in PVS) and 
will be supported by the full SAL typechecker, which is based on that of PVS. 

4 Model Checkers 

SAL 2 provides high performance symbolic and bounded model checkers (SMC 
and BMC, respectively) for systems defined over finite state types, and a novel 
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“infinite bounded” model checker (inf BMC) that can handle infinite as well 
as finite state types; SAL 2.1 added the “witness” model checker (WMC) that 
performs finite-state CTL model checking using a new symbolic method. 

The SMC and WMC symbolic model checkers use the CUDD BDD pack- 
age and provide access to its options for controlling the ordering and dynamic 
reordering of variables. The representation of the transition relation as a BDD 
and the evaluation of the transformed assertion use many optimizations and 
deliver performance comparable to other state-of-the-art symbolic model check- 
ers, most of which start from much more primitive notations. In a case study 
with Holger Pfeifer and Wilfried Steiner concerning fault-tolerant startup of the 
Time-Triggered Architecture (TTA), we routinely analyzed systems with many 
hundreds of state bits and hundreds of billions of reachable states in tens of 
minutes using commodity workstations. 

The WMC model checker implements a novel approach that constructs both 
symbolic witnesses (positive) and counterexamples (negative) for assertions in 
full CTL. This symbolic evidence is useful in abstraction-refinement, vacuity 
checking and controller synthesis, and also allows explicit (trace or tree-like) 
witnesses and counterexamples to be extracted. 

The BMC model checker uses a propositional SAT solver to search for coun- 
terexamples no longer than some specified “depth” (i.e., length); the model 
checker can be instructed to advance the depth incrementally, so that it will find 
the shortest counterexample, and it can also verify properties by fc-induction 
(optionally using other formulas as lemmas). By default, SAL uses ICS as its 
SAT solver, but it can optionally be instructed to use zChaff or GRASP. In our 
TTA startup example, the SAL bounded model checker would often solve prob- 
lems having hundreds of thousands of DAG nodes in their SAT representations 
(and more than 600 variables in a BDD representation) in a few minutes. 

The inf BMC model checker uses the standard formulation of bounded model 
checking, but instead of translating into a purely propositional SAT problem, 
it translates to the theory supported by ICS. Although ICS is competitive as a 
pure SAT solver, it is actually a decision procedure and satisfiability solver for 
the combination of ground (i.e., unquantified) real and integer linear arithmetic, 
equality with uninterpreted function symbols, products (i.e., tuples) and co- 
products (i.e., disjoint sums), propositional calculus and propositional sets, and 
restricted forms of lambda calculus, bitvectors, and arrays. Like its finite coun- 
terpart, the inf BMC model checker can advance its depth of search incrementally 
and can perform ^-induction. Counterexamples are presented symbolically. Al- 
though inf BMC uses ICS as its default satisfiability procedure, it can also be 
instructed to use UCLID, SVC, CVC, or CVC-Lite, albeit with restrictions (e.g., 
UCLID decides less theories than ICS) and without counterexamples. 

Using real or unbounded integer state types, SAL can represent infinite state 
systems such as hybrid or timed automata, and other formulations of continuous 
or real-time behavior. For example, with Bruno Dutertre, we have developed 
a timed formulation for TTA startup: instances with up to 10 nodes (whose 
representation uses 24 real and 99 discrete variables) can be verified in a few 
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minutes using inf BMC to perform 1-induction on a series of lemmas. Instances 
of Fischer’s real time mutual exclusion algorithm with as many as 39 nodes have 
been verified in the same way. 



5 Scripting and the SAL Simnlator 

The preprocessing and model checking components of SAL can be accessed 
through an API defined in Scheme. The actual model checkers are simply Scheme 
scripts defined over this API. Users can write their own scripts to perform spe- 
cialized analyses using the full resources of SAL. The SAL Simulator provides 
a convenient environment in which to develop such scripts: it is essentially a 
read-eval-print loop with the SAL libraries preloaded. Used as a simulator, it 
allows users interactively to explore a specification by executing selected tran- 
sitions, filtering the current set of states, or finding a path to a state satisfying 
a given assertion. Used as an environment for scripting, all the capabilities de- 
scribed above can be employed within user-written Scheme functions. For exam- 
ple, with Gregoire Hamon we have used this capability to develop a prototype 
test case generator for Stateflow that first uses symbolic model checking to find 
a path to some previously unvisited state or transition, then alternates slicing 
and bounded model checking to extend the path to additional unvisited targets. 

6 Plans for Further Development 

SAL 1, which is still available, provides an explicit-state model checker for a 
subset of the language supported in SAL 2. We intend to redevelop this model 
checker and to integrate it with the others in forthcoming versions of SAL. We 
will also integrate the extensions for specifying and abstracting hybrid systems 
developed by Ashish Tiwari. 

Over the longer term, we intend to integrate SAL with PVS (so that, for 
suitable specifications, it will be possible to translate SAL into PVS, and vice- 
versa), and to evolve both into an open scriptable environment for symbolic 
analysis in which numerous tools, developed by ourselves and others, will interact 
through a SAL Tool Bus. The tool bus will extend the SAL language with XML 
representations for the many artifacts and intermediate products of analysis: for 
example, invariants, abstractions, counterexamples, test cases and their outputs. 



7 Current Status and Availability 

SAL 2 with all the capabilities described is freely available for noncommer- 
cial research purposes (i.e., roughly, research that will be openly published) 
from http://sal.csl.sri.com. Binary versions of the system, which require 
an automatically-generated license key, may be downloaded for Linux, Solaris, 
MacOS X, and Cygwin (for Windows). The SAL binaries also install the ICS 
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executable. The SAL and ICS source code is available with a signed license agree- 
ment. The top-level page for tools developed by our group is http://fm.csl. 
sri . com, from which you can find links to our Roadmap, papers, examples, and 
a tutorial illustrating all our tools. 

For want of space, all references have been omitted here; they are present in 
an expanded version pf the paper available at 
http : //www. csl . sri . com/~rushby/abstracts/sal-cav04. 




Formal Analysis of Java Programs in JavaFAN 



Azadeh Farzan, Feng Chen, Jose Meseguer, and Grigore Ro§u 



Department of Computer Science 
University of Illinois at Urbana-Champaign 
{ af arzan , f engchen , meseguer , grosu}@cs . uiuc . edu 



Abstract. JavaFAN is a Java program analysis framework, that can 
symbolically execute multithreaded programs, detect safety violations 
searching through an unbounded state space, and verify finite state pro- 
grams by explicit state model checking. Both Java language and JVM 
bytecode analyses are possible. JavaFAN’s implementation consists of 
only 3,000 lines of Maude code, specifying formally the semantics of Java 
and JVM in rewriting logic and then using the capabilities of Maude for 
efficient execution, search and LTL model checking of rewriting theories. 



1 Introduction 

JavaFAN (Java Formal ANalyzer) is a tool to simulate and formally analyze 
multithreaded Java programs at source code and/or bytecode levels. A novel 
feature of JavaFAN’s design is that it is directly based on formal definitions of 
the Java and the JVM semantics in the form of rewrite theories that are efficiently 
executed and analyzed in the Maude language [2] . The following types of analysis 
are supported: (1) symbolic simulation, with Java and JVM specifications used 
as interpreters executing programs with actual or symbolic inputs; (2) breadth- 
first search (BFS) within a concurrent program’s state space to find violations 
of safety properties; (3) model checking of linear temporal logic (LTL) properties 
for programs whose state space is finite. These forms of analysis are efficiently 
supported by Maude’s underlying rewriting, breath-first search, and LTL model 
checking features [2]. To keep the framework user-friendly, JavaFAN wraps the 
Maude specifications and accepts Java or JVM code from the user as input. 

JavaFAN’s specification-based design has a number of important advantages: 
(1) formal specifications provide a rigorous semantic definition for a language 
that can be mathematically scrutinized; (2) such formal specifications can be 
developed with relatively little effort, even for large languages like Java and the 
JVM; (3) Maude’s underlying formal analysis infrastructure is entirely generic, 
so that formal analysis tools become available for free for each language so spec- 
ified; and (4) in spite of their generality, the formal analyses can be performed 
with competitive performance. Section 3 discusses several examples, providing 
analysis times and comparisons of JavaFAN with other related analysis tools. 

The reason for JavaFAN’s efficiency is twofold. On the one hand, Maude 
has a rewrite engine achieving millions of rewrites per second, an efficient BFS 
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Fig. 1. Architecture of JavaFAN. 



algorithm, and an explicit state LTL model checker with performance compa- 
rable to SPIN [3]. On the other hand, our approach in specifying the semantics 
of a concurrent language as a rewrite theory in Maude tries to maximize per- 
formance. A rewrite theory is a triple with S a signature declaring 

types and function symbols, E a set of equations, and R a set of rewrite rules. 
Intuitively, the equations E specify the semantics of a language’s deterministic 
computations, whereas the rules R specify its concurrent transitions. The point 
is that the state space is only defined by the rules R\ the smaller R is, the more 
efficient the analysis. In JavaFAN, R is indeed small compared to E: for Java 
|i?| = 15 and \E\ = 600, and for JVM \R\ = 40 and \E\ = 300. A continuation- 
based semantics also increases performance, because most equations and rules 
then become unconditional and thus more efficient to execute. Finally, by distin- 
guishing between the static and dynamic parts of a program, only the dynamic 
component is kept in the state representation, with huge resource savings for 
large programs. 

2 Overview of JavaFAN 

Figure 1 presents the architecture of JavaFAN. The user interface module hides 
the Maude back-end behind a user-friendly environment. It also plays the role of 
a dispatcher, sending the Java source code and/or the bytecode to Java and/or 
JVM analyzers, respectively. The analyzers wrap the input programs into prop- 
erly defined Maude modules and invoke Maude, which analyzes the code based 
on formal specifications of the Java language and the JVM. The output format- 
ter collects the output of Maude, transforms it into a user-readable format, and 
sends it to the user. We use Maude to specify the operational semantics of a suffi- 
ciently large subset of Java and the JVM, including multithreading, inheritance, 
polymorphism, object references, and dynamic object allocation. Native meth- 
ods and many of the Java built-in libraries are not currently supported. Java and 
the JVM are modeled differently. For Java, a quite efficient continuation-based 
style is adopted, while for the JVM we use an object oriented style that makes 
the specification simpler and easier to understand. 
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Table 1. Thread Game Times. 



N 


JVM 


Java 


50 


7.2 


2.7 


100 


17.1 


6.6 


200 


41.3 


17 


400 


104 


54.7 


500 


4.5m 


2m 


1000 


10.1m 


5.1m 



Continuation- Based Semantics of Java. The semantics of Java is defined 
modularly - different features of the language are defined in separate modules 
- to ease extensions and maintenance. A state is a multiset of state attributes, 
such as threads, memory, synchronization information, etc. To support multi- 
threaded programs, we introduce the notion of thread context, which consists 
of three components: (1) a continuation, (2) the thread environment, and (3) 
the corresponding object. The continuation maintains the control context of the 
thread, which explicitly specifies the next steps to be performed by it. 

Object-Based Semantics of the JVM. The state of the JVM is represented 
as a multiset of objects and messages in Maude. Objects in the multiset fall 
into four major categories: (1) objects which represent Java objects, (2) objects 
which represent Java threads, (3) objects which represent Java classes, and (4) 
auxiliary objects used mostly for definitional purposes. Rewrites (with rewrite 
rules and equations) in this multiset (modulo associativity, commutativity, and 
identity) model the changes in the state of the JVM. In each rewrite, there is 
usually one thread involved, together with classes and/or objects that may be 
needed to execute the next bytecode instruction. 

3 Experiments 

Using the underlying search and model checking features of Maude, JavaFAN 
can be used to formally analyze Java programs. Breadth-first search analysis 
(supported through Maude’s search command) is a semi-decision procedure that 
can be used to explore all the concurrent computations of a program, looking 
for safety violations characterized by a pattern and a condition. This empowers 
JavaFAN to analyze programs with possibly infinite state spaces. For finite state 
programs, it is also possible to perform explicit-state model checking (using 
Maude’s model checker) of properties specified in linear temporal logic (LTL). 

JavaFAN has effectively been applied on a number of examples. Performance 
results are given in seconds on a 2.4 GHz Linux PC. Detailed discussions on the 
examples can be found in [4]. Results are given at both Java and JVM levels. 

Remote Agent (RA) [5] has two running threads: a planner that generates 
plans from mission goals, and an executive that executes the plans. The code 
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Table 2. Dining Philosophers Times. 



Tests 


JVM 


Java 


DP(5) 


4.5 


9.9 


DP(6) 


33.3 


81.7 


DP(7) 


4.4m 


15.1m 


DP(8) 


13.7m 


98m 


DP(9) 


803.2m 


— 


DF(5) 


3.2m 


19.2 


DF(6) 


23.9m 


2.4m 


DF(7) 


686.4m 


27m 



contains a missing critical section, that leads to a data-race between two con- 
current threads, which further caused a deadlock. JavaFAN finds the deadlock 
in 0.3 of a second in the bytecode level and 0.09 of a second in the source-code 
level, while the tool in [8] finds it in more than 2 seconds in its most optimized 
version. 

Thread Game [7] is a simple multithreaded program which shows the possible 
data races between two threads accessing a common variable. Each thread reads 
the value of the static variable c twice and writes the sum of the two values back 
to c. The question is what values can c possibly hold during the infinite execution 
of the program. Theoretically, it can be proved that all natural numbers can be 
achieved [7]. JavaFAN (using the search command) addresses this question for 
any specific value of N. Table 1 presents the result for some numbers. 

Dining Philosophers. We have model checked a deadlock-prone (DP) and a 
deadlock- free (DF) version of this problem. The property that we have model 
checked for this example is whether all the philosophers can eventually dine. 
The LTL formula is OCheck(N), where N is the number of philosophers. The 
proposition Check(N) is specified in Maude by the user. For the deadlock-prone 
version, the model checker generates a counterexample - a sequence of states 
that leads to the deadlock - and states the property to be true in the deadlock- 
free version. Currently, JavaFAN can detect the deadlock for up to 8 philosophers 
at the bytecode level and up to 9 philosophers at the Java Language level in a 
reasonable amount of time (Table 2) . It can also prove the program deadlock- free 
for up to 7 philosophers at both levels. This compares favorably with JPF [1,6] 
which for the same program cannot deal with 4 philosophers. 

2-Stage Pipeline is a pipeline computation, where each pipeline stage executes 
as a separate thread. Stages interact through connector objects. The property we 
have model checked for this program states the “eventual shutdown of a pipeline 
stage in response to a call to stop on the pipeline’s input connector”. The LTL 
formula for the property is □(clstop — > O(^stagelreturn)). JavaFAN model 
checks the property and returns true in 17 minutes (no partial order reduction 
was used) . This compares favorably with the model checker in [8] which without 
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using the partial order reduction performs the task in more than 100 minutes 
(both on a 2.4GHz PC). 

4 Conclusions and Future Work 

We have discussed JavaFAN’s design, and experiments suggesting that it can an- 
alyze Java and JVM programs with competitive performance. Perhaps the most 
important experience gained is that a formal specification based methodology 
to develop formal analysis tools for concurrent programs is a rigorous, cost- 
effective, and practical approach when realized on a high-performance logical 
engine. The methodology itself is generally applicable to other languages. Fu- 
ture work will involve further experimentation, development of similar tools for 
other languages, partial order reduction optimization, and widening the range of 
formal analyses supported, including program abstraction and theorem proving. 
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1 Introduction 

We present a toolset for design and verification of Globally Asynchronous Lo- 
cally Synchronous(GALS) systems. Such systems consist of a network of reactive 
nodes which have independent clocks and I/O interfaces, and communicate us- 
ing complex synchronisation mechanisms. GALS systems are gaining prevalence 
in avionics, embedded systems, and VLSI design. These systems are difficult to 
design and verify due to the concurrency and complex interaction involved. 

The toolset is based on a visual formal language called Gommunicating Re- 
active State Machines(GRSM)[6], which builds upon Gommunicating Reactive 
Processes [2]. It seamlessly integrates a graphical editor, a simulator and a veri- 
fication engine. It has several novel aspects in the areas of language design and 
verification. The semantics of GRSM consolidates ideas from the synchronous 
languages with classical concurrency constructs. The simulator implements a 
distributed protocol to incorporate pre-emption with asynchronous communica- 
tion and supports distributed simulation with context switches. Properties are 
specified using distributed observers and verified using Spin[4]. The verification 
engine includes a non-trivial translation from GRSM, an open system with GALS 
semantics, to Promela, a closed system with asynchronous semantics. In addi- 
tion, Spin has been modified to generate counter examples that can be viewed 
directly in the simulator. 

We have used the tools to model and verify standard pedagogical examples, 
and for technology transfer in a company. We have found GRSM well suited for 
providing cycle accurate descriptions of control dominated architectures with 
multiple clock domains. Industrial case studies include a multi-processor System- 
on-Ghip(SoG) application and a bus protocol. In this paper, we illustrate these 
tools using a case study. Section 2 introduces underlying theory. Section 3 dis- 
cusses the tools, implementation issues, and our experience, and Section 4 con- 
cludes. 

2 Communicating Reactive State Machines 

A GRSM is a network of nodes built from communicating boolean Mealy-style 
automata using constructs for synchronous and asynchronous parallel composi- 
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DSP 




Fig. 1. ARM and DSP components of Infophone 



tion, hierarchy, and signal hiding. The nodes are locally synchronous, execute 
concurrently, emit signals via synchronous broadcast, and communicate on point- 
to-point channels using CSP-style rendezvous. A formal description is provided 
in [9]. We illustrate CRSM using Infophone^, a speech enabled Java application 
for information retrieval. It uses an ARM processor to control the user interface, 
a DSP to process speech commands, and a wireless web interface. 

A CRSM description of Infophone is written as ARM/ / DSP/ /WEB, where 
//is the operator for asynchronous parallel composition. Figure 1 shows sim- 
plified versions of ARM and DSP. The rectangles and circles represent passive 
and communication states respectively. The dashed arrows from communication 
states denote transitions taken when communication succeeds and solid arrows, 
transitions taken when their guards are true; guards describe the status of signals 
in the environment. 

When activated by the signal usrMenu, ARM receives the user’s request, say 
fltReq and forwards it to DSP on the relevant channel, in this case Flight. 
DSP receives speech commands, sends a request to WEB on WReq and notifies 
ARM when a response is received on WResp. A session ends in three ways: 
successfully, when ARM receives a message on the channel Done from DSP, 
times out, when a timeout is issued by DSP if WEB is not responding, and 
aborts, when the user issues usr Abort. 

The state AwaitDSP in ARM has hierarchy and contains two automata. 
The transitions leaving AwaitDSP allow these automata to complete their on- 
going reaction before passivating them, a policy of weak pre-emption. The au- 
tomata Flightinfo, Stockinfo, Weatherinfo and UserAbort in DSP exe- 
cute in synchronous parallelism, written FlightInfo||..||UserAbort and interact 



^ Infophone was developed on the Open Multimedia Application Platform(OMAP), a 
trademark of Texas Instruments. 
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Fig . 2 . Distributed Observers: Oarm and Odsp 



using local signals such as exit, term and finish. Nodes are required to be 
deterministic and constructive [1, 9]. 

Safety properties of CRSMs are specified using distributed observers. An 
observer monitors the status of a node, communicates with other observers, 
and enters a special state Bug when a property is violated. Verification involves 
checking the system {ARM\\Oarm)/ I{DSP\\Odsp)I /{WEB\\Oweb) for the 
reachability of the state Bug. The observers in Figure 2 specify that a session 
terminated in ARM should be terminated in DSP within its next cycle. The 
conditional iri-{state) holds if the state is active at the end of the reaction. 
A subtle error occurs when usr Abort is issued in ARM and timeout in DSP. 
ARM transits to the state Abortl, communicates with the automaton UserAbort 
in DSP and enters Idle. DSP consequently enters the state TMoutl. The next 
time f ItReq is issued in ARM, the system will deadlock. 

3 Experience and Discussion 

We have developed a graphical environment which integrates the design and 
verification flow described. CRSM models can be built using a graphical editor 
or a textual language tCRSM. The execution sequences can be viewed in the 
simulator, which performs a must and can analysis to determine the status of 
local signals[l] and implements a distributed protocol[7] to address issues due to 
pre-emption in the presence of communication [8]. 

Model checking is performed by translating the system to Promela[5, 10]. The 
Promela code for each node includes a reactive kernel and an environment pro- 
cess, and ensures that the status of signals and states in the system are evaluated 
correctly. Signal hiding requires must and can analysis to be incorporated in the 
Promela code. Spin is invoked automatically with the specification D^Bug^always 
not Bug) and counter examples generated are translated into traces and displayed 
in the simulator. Spin has been modified for this purpose. 

The tools described have been implemented in approximately 30,000 lines 
of C and Java code. Our case studies include Infophone and a proprietary bus 
protocol used by Texas Instruments. The Infophone system with its observers 
was translated into 890 lines of Promela code with 107 boolean variables, while 
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the bus protocol was translated to 270 lines of Promela code with 31 boolean 
variables. Boolean variables are required for state and signal encoding, perform- 
ing analysis, providing observer related primitives and implementing rendezvous. 
The absence of local signals and asynchronous communication resulted in fewer 
booleans in the bus protocol code. We observed that CRSM models provided 
Register Transfer Level(RTL) style structure, yet complete and cycle accurate 
descriptions of the bus protocol. In addition, the designers find temporal logics 
intimidating and prefer state machine based specifications. 

4 Conclusion 

We have presented a tool set for modelling and verification of GALS applications. 
Tools such as SAL/P VS, Polls, Reactive Modules, and SMV capture both mod- 
els of concurrency but significant semantic differences exist such as notions of 
acceptable programs and pre-emption mechanisms. Our work differs from most 
Statecharts based verification engines for similar reasons. 

At present, we have developed static analysis[3] and refinement techniques[9] 
to ameliorate verification. These techniques are currently being incorporated in 
the tool. We are also exploring other techniques that might aid the designers 
and are conducting further case studies. 
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1 Introduction 

This paper presents Web Service Analysis Tool (WSAT), a tool for analyzing and ver- 
ifying composite web service designs, with the state of the art model checking tech- 
niques. Web services are loosely coupled distributed systems communicating via XML 
messages. Communication among web services is asynchronous, and it is supported by 
messaging platforms such as JMS which provide FIFO queues to store incoming mes- 
sages. Data transmission among web services is standardized via XML, and the spec- 
ification of web service itself (invocation interface and behavior signature) relies on a 
stack of XML based standards (e.g. WSDL, BPEL4WS, WSCI and etc.). The charac- 
teristics of web services, however, raise several challenges in the application of model 
checking: (1) Numerous competing web service standards, most of which lack formal 
semantics, complicate the formal specification of web service composition. (2) Asyn- 
chronous messaging makes most interesting verification problems undecidable, even 
when XML message contents are abstracted away [3]. (3) XML data and expressive 
XPath based manipulation are not supported by current model checkers. 

WSAT, as shown in Fig. 1, tackles these challenges as follows: (1) An Intermediate 
Representation: We use automata with XPath guards (called GFS A) as an intermediate 
representation for web services. A translator from BPEL4WS to GFSA is developed, 
and support for other languages can be added without changing the analysis and the 
verification modules of the tool. (2) Synchronizability and Realizability Analyses: 
We define a set of sufficient synchronizability conditions to restrict control flows of a 
composite web service. When the analysis succeeds, LTL verification can be performed 
using the synchronous communication semantics instead of asynchronous communica- 
tion semantics. We also define a set of sufficient realizability conditions that are used to 
synthesize a set of GFSA (called peers) which communicate with asynchronous mes- 
sages from a single GFSA (called a conversation protocol) which specifies the set of 
desired global behaviors. The behaviors of the synthesized peers are the same as the 
behaviors of the conversation protocol if the conversation protocol is realizable [3]. (3) 
Handling of XML Data Manipulation: We developed and implemented algorithms 
for translating XPath expressions to Promela code [5], and we use model checker SPIN 
[7] as the back-end of WSAT to check LTL properties. 

2 Guarded Finite State Automata 

A composite web service can be specified in either bottom-up or top-down fashion. 
Formally, for a composite web service, its bottom-up specification (called a web service 
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Fig. 1. WSAT architecture 



composition) [4] is described as a tuple {{P, M), Ai, . . . ,An), and its top-down spec- 
ification (called a conversation protocol) [2,3,6] is specified as a tuple {{P,M),A). 
Here (P, M) is the composition schema where P is a set of peer prototypes, and M is a 
set of message classes. Guarded Finite State Automata (GFSA) , . . . , An are the peer 

implementations (supposing |P| = n), and A specifies the desired set of global behav- 
iors. Below, we present a fragment of the Stock Analysis Service (SAS) conversation 
protocol studied in [5]; 



Conversation Protocol! 

Composition Schema { 

Peer List { Inv, SB, RD} , 

MSL Type List{ 

Register [ 
orderlD [xsd : int] , 
reqList [stockID [xsd : int] { 1 , 10 } ] , 
payment [ 
account [xsd: int] | 
credit Card [xsd : int] 

] ] , ... 

Message List{ 

register { Inv- >SB : Register} , 
bill (SB- >Inv : Bill } , . . . 

}} 



gfsa{ 

States(sl,s2, . . . ,sl2}, 

InitialState (si), 

Final States { s3 } , 

TransitionRelationj 
tl4{s8 -> sl2 : bill, 

Guard! 

$request//stockID = 

$ register/ / stockID [position ( ) =last ( ) ] 






$bill// order ID 



$register/ /orderlD 



}7/end of TransitionRelation 
}//end of GFSA 

}//end of Conversation Protocol 



As shown above, each message class has a type defined using MSL [1], a compact 
theoretical model of XML Schema. WSAT supports a fragment of MSL, where complex 
types can be constructed using sequence ‘ , ’ (e.g. the Register) and choice ‘ | ’ (e.g. the 
payment) operators. An MSL type can also have multiple occurrences (e.g. payment 
can have 1 to 10 stockID children), however, maximum occurrence must be bounded. 

A GFSA is a tuple (M, T, s, F, A). M is the message class set in the composition 
schema. T, s, F are the set of states, initial state, and the set of final states, respectively. 
A is the transition relation. Each transition r S is of the form r = (s, (c, 5), f), where 
s,t & T are the source and the destination states of r, c G M is a message class and g is 
the guard of the transition. Guards are written using XPath expressions. WSAT supports 
a subset of XPath which consists of the following operators: child axis (/), descendant 
axis (//), self-reference (.), parent-reference (. .), basic type test (bO), node name 
test (t), wildcard (*), function calls position () and last(), and predicates ([]). 
Arithmetic and boolean constraints can be used as predicates in WSAT. 
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As shown in the SAS protocol, a guard consists of a guard condition and a set of 
assignments which specify the contents of the message that is being sent. For example 
the guard of transition tl4 specifies that: if the stockiD attribute in the request 
message is the last stockiD in the register message, then send out a bill message 
whose orderlD attribute matches the orderlD of register. The powerful XPath 
language allows guards to express very rich semantics. In [4] we showed that static 
BPEL4WS web services can be translated into GFSA representation without loss of 
data semantics. 



3 Synchronizability and Realizability Analyses 

Consider the simple client-server web service composition given in the following fig- 
ure. A requester and a server interact with each other via three request messages and 
one acknowledgment message. Recall that each peer is equipped with a queue to store 
incoming messages. It is not hard to infer that the composition has an infinite number 

of configurations, because the requester can 
send arbitrary number of ri and r 2 mes- 
sages (which are stored in the queue of the 
server) before any acknowledgment is sent 
back. However, another interesting observa- 
tion is that the global behaviors, character- 
ized by the sequence of messages, is a regu- 
lar language (ri | V 2 \ ra)*e, which is the set of behaviors generated by the composition 
under synchronous semantics (i.e., the Cartesian product of the two automata). Since in 
our model [3], LTL properties are defined over the global behaviors, it is decidable to 
check LTL properties for such services. 

We say a web service composition is synchwnizable if it generates the same set of 
global behaviors for both synchronous and asynchronous semantics. In [4] we present 
three sufficient synchronizability conditions to identify synchronizable web service 
compositions. For each synchronizable web service composition WSAT will generate a 
Promela specification with synchronous (rendezvous) communication (by limiting the 
Promela channel size to 0), and then call SPIN to verify LTL properties on the syn- 
chronous specification. The verified LTL properties are guaranteed to be satisfied by 
the original asynchronous web service composition. For top-down specified conversa- 
tion protocols, we developed a similar analysis called realizability analysis [3], which 
is further extended to consider message contents in [6]. 



requester server 




4 Handling XML Data Manipulation 

WSAT translates each GFSA into a Promela process. The central issue of the translation 
algorithm is how to handle XML data and XPath expressions. Each MSL type declara- 
tion is translated into a record type (“typedef”) in Promela, and types with multiple 
occurrences are translated into Promela arrays. For example, the stockiD in the SAS 
protocol is mapped into a Promela array of size 10. Based on the type mapping, XPath 
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expressions can be translated into Promela code. When MSL types with multiple occur- 
rences are involved, the translation is essentially a nested-loop. For example, consider 
the following boolean XPath expression: $regl//stockiD = $reg2//stockiD, 
where $regl and $reg2 are two XML variables of type Register that is defined in 
the SAS protocol. Note each side of the equality is a location path which returns a set of 
stockiD values. According to XPath standard, the expression evaluates to true if we 
can find one value from each side to satisfy the equality. Hence the expression captures 
the query: “Is there any stockiD value which appears in both $regl and $reg2?”. 
The corresponding Promela translation is a two-layer nested loop which searches the 
two arrays (that correspond to the stockiD array of $regl and $reg2 respectively), 
to find a pair of array elements that satisfy the equality. When function calls such as 
position 0 and last () are involved (e.g. the transition guard of tl4 in the SAS 
protocol), the translation is more complex. The main idea is to substitute the appear- 
ance of a function call with an integer variable, and properly update its value so that 
when the function is called the integer variable contains the right value. More details 
are available in [5]. 

We applied WSAT to a range of examples, including six conversation protocols con- 
verted from the IBM Conversation Support Project [8], five BPEL4WS services from 
BPEL4WS standard and CoIlaxa.com, and the SAS from [5]. Synchronizability and re- 
alizability analysis are applied to each example, and except two conversation protocols, 
all examples pass these checks. This implies that the sufficient conditions in our syn- 
chronizability and realizability analysis are not restrictive and they are able to capture 
most practical applications. For each example, we generated the corresponding Promela 
specification using WSAT, and we checked LTL properties of the form “G(p — > Fg)” 
using SPIN. Our experience with these examples suggests that while exhaustive search 
of the state space may be very costly for verifying correct properties, SPIN’S perfor- 
mance at discovering false LTL properties is satisfactory. For instance, we identified 
a very delicate design error (a misuse of XPath position ( ) function in a transition 
guard) in the SAS example [5] using SPIN. 

WSAT can be extended in the future, by supporting other web service specification 
languages at the front end, and targeting different verification tools at the back-end. We 
are especially interested in extending WSAT with symbolic verification techniques in 
order to handle large state spaces generated by XML data. 
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Abstract. We describe a tool called CVC Lite (CVCL), an automated 
theorem prover for formulas in a union of first-order theories. CVCL 
supports a set of theories which are useful in verification, including un- 
interpreted functions, arrays, records and tuples, and linear arithmetic. 
New features in CVCL (beyond those provided in similar previous sys- 
tems) include a library API, more support for producing proofs, some 
heuristics for reasoning about quantifiers, and support for symbolic sim- 
ulation primitives. 



1 Introduction 

Decision procedures for decidable fragments of first-order logic continue to at- 
tract users and interest in a wide variety of verification efforts. 

CVC Lite (CVCL) is a tool for determining the validity (or satisfiability) of 
first-order formulas over a union of specific useful theories. It replaces the original 
Cooperating Validity Checker (CVC) [7], which, in turn, was a successor to the 
Stanford Validity Checker (SVC) [4]. The name does not imply that the new 
system is less powerful than CVC, but rather was chosen because after learning 
from our experience with CVC, we felt we could create a tool which, without 
sacrificing functionality, would be smaller, faster, and easier to use and maintain. 

Although CVCL is a work in progress, in many respects it has already val- 
idated our vision and rewarded the effort involved in a reimplementation. In 
particular, the code base is one third the size of CVC, the performance is com- 
parable, and it has been used and enhanced by a number of people outside the 
core group of developers. In addition, CVCL has many new features, not found 
in any of the previous systems. 

In this paper, we will describe the theory and features of CVCL, with an 
emphasis on what is new as compared to the previous systems (especially CVC). 
We begin with a brief overview of the system and the theories which are currently 

* This research was supported by a grant from Intel Corporation and by National 
Science Foundation CCR-0121403. 
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supported in CVCL. Then we describe the features which are new in CVCL and 
conclude with some example applications. 

2 Overview 

CVCL accepts as input one or more assertion formulas and a query formula. 
It then checks whether the assertion formulas imply the query formula. Each 
formula must be a first-order formula whose parameters (non-logical symbols) 
must be from among the theories listed in the next section. 

The algorithm used depends on the Nelson-Oppen method for combining 
decision procedures [6] and the implementation is based closely on an algorithm 
whose correctness is verified in the first author’s Ph.D. thesis [3]. 

Although there is limited support for quantifiers in CVCL (see below), the 
algorithm is complete only for quantifier-free formulas. As with its predecessor, 
CVCL uses advanced SAT-based search heuristics and has the ability to produce 
a proof when a formula is successfully validated. 

3 The Theories of CVCL 

3.1 Equality with Uninterpreted Functions 

The simplest supported theory is one which contains an arbitrary number of 
functions and predicates which are “uninterpreted” , meaning that the theory 
does not provide any information about them other than that they are func- 
tions and predicates. Because the set of non-logical symbols in this theory varies 
according to the formulas being checked, the user must specify the set of such 
functions and predicates for a particular run of CVCL. 

3.2 Arrays 

CVCL includes a theory of abstract arrays with two operations, read and write 
which can be used to read from a location in an abstract array or to create a 
new array by writing a new value to a location in an existing array. 

3.3 Records and Tuples 

CVCL formulas can include simple aggregate datatypes like records and tuples. 
These are handled with a simple decision procedure for a set of operations used to 
create, read from, and write to these datatypes (much like the array operations). 

3.4 Arithmetic 

As with its predecessors, CVCL can decide the theory of linear arithmetic over 
the reals. However, CVCL also has some additional capabilities. The first is the 
ability to deal with linear arithmetic over integers. In fact, CVCL can reason 
about linear expressions over any combination of real and integer variables. 

The other extension implemented in CVCL is the ability to handle some 
nonlinear arithmetic. Nonlinear expressions are transformed into a normal form, 
making it possible to verify simple identities like {a+b){a — b) = a? — b^. However, 
the nonlinear capabilities of CVCL are still very limited. 
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3.5 Additional Theories 

Currently, new decision procedures are being developed for inductive datatypes, 
a subset of set theory, and a theory of bit- vectors. 

4 New Features 

4.1 Library API 

One of the main features lacking in both SVC and CVC was a library interface. 
Interaction with the old systems was done using a small custom command lan- 
guage. Commands were either typed in manually or provided through a scripting 
mechanism. 

CVCL has the same command language interface, but we also designed an 
abstract interface into CVCL from the start. The methods in this API mimic the 
command language, so that it is easy to move from one mode of interaction to the 
other. In fact, the command language interface is implemented using the API, 
minimizing the chance that the two modes of interaction will behave differently. 

The API is available both as an abstract C-I--I- class and as a set of C func- 
tions. It has been successfully used as a library from C-|— 1-, and the C interface 
has been successfully used by the foreign function interface of other languages 
including Prolog and Ocaml. 

4.2 Proof Support for Efficient Boolean Reasoning 

A major feature of the original CVC system was the ability to produce a proof 
artifact as the result of successfully validating a formula. However, CVC could 
only produce proofs when using a slow search heuristic. When using advanced 
SAT-based heuristics, which are essential on large formulas, CVC was unable to 
produce a proof because it depended on an external SAT solver and had no way 
to extract a proof from this solver. 

CVCL overcomes this difficulty by integrating a custom SAT solver and in- 
cluding proof rules for the kinds of reasoning done in modern efficient Boolean 
SAT solvers [2]. This enables CVCL to use advanced techniques like clause learn- 
ing and conflict-directed backtracking while still producing proofs. 

4.3 Quantifiers 

One of the most significant new features of CVCL is native support for quan- 
tifiers. Adding quantifiers necessarily makes the logic undecidable, but in many 
practical examples, even very simple heuristics for quantifier instantiation can 
be sufficient. 

The current heuristic used by CVCL is to collect the set of terms that have 
occurred in some previous formula, and then use these terms to instantiate the 
quantified variables of similar type. This is a very close reimplementation of 
the heuristic used by Das and Dill [5] for solving quantified formulas arising in 
predicate abstraction. 
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4.4 Symbolic Simulation 

A primitive interface for symbolic simulation was built into CVC, and success- 
fully applied to applications in hardware verification [1]. CVCL provides a more 
extensive and intuitive interface to symbolic simulation primitives. 

5 Conclusion 

Since becoming available in August 2003, CVCL has been downloaded by many 
research groups and used in a wide variety of verification efforts in both hardware 
and software. 

One representative example is the work on compiler validation being done 
at NYU. CVCL is used to verify the verification conditions generated by a tool 
which checks the correctness of transformations done by an optimizing com- 
piler [8]. 

CVCL has an active user and development community. More information, 
including instructions for downloading and installing the tool, is available at the 
CVCL web page: http://verify.stanford.edu/CVCL. 
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Abstract. CirCUs is a satisfiability solver that works on a combination of And- 
Inverter-Graph, CNF clauses, and BDDs. It has been designed to work well with 
bounded model checking. It takes as inputs a Boolean circuit (e.g., the model 
unrolled k times) and an optional set of additional constraints (for instance, re- 
questing that a solution correspond to a simple path) in the form of CNF clauses 
or BDDs. The algorithms in CirCUs take advantage of the mixed representation 
by applying powerful BDD-based implication algorithms, and decision heuris- 
tics that are objective-driven. CirCUs supports incremental SAT solving, early 
termination checks, and other analyses of the model that translate into SAT. Ex- 
perimental results demonstrate CirCUs’s efficiency. 



1 Introduction 

Efficient satisfiability (SAT) solvers [15, 18, 11,6] have helped make Bounded Model 
Checking (BMC [2]) a widely used alternative to BDD-based model checking. The 
performance of BMC heavily depends on that of the SAT solver. At the same time, 
SAT-based model checking goes beyond the simple check for the existence of coun- 
terexamples, and involves a wide array of analyses such as early termination checks for 
invariants [13] and general LTL properties [1]. SAT solvers are also used in verifica- 
tion algorithms that combine BDDs and CNF [4], in the computation of concise proofs 
of satisfiability [12] and unsatisfiability [19], and in unbounded model checking [9, 
10]. CirCUs is a SAT solver designed to be flexible enough to be used in all these tasks, 
while exploiting knowledge of the problem structure to provide better performance than 
generic SAT solvers in BMC. 

CirCUs’s input is a combination of And-Inverter-Graph (AIG [8]), CNF clauses, and 
BDDs. It combines the strengths of these different representations in a hybrid Boolean 
reasoning framework. The bounded model checker that uses CirCUs unrolls the model 
in the form of an AIG and applies optimizations like BDD sweeping and initial states 
propagation to it [8] to remove redundancy, which is a prime cause of inefficiency in 
SAT solvers. The constraints representing the property to be checked may be expressed 
as part of the AIG or as additional CNF clauses. Given these inputs, CirCUs transforms 
parts of the circuit into BDDs so as to apply powerful BDD-based implication algo- 
rithms [7]. It then looks for an assignment to the variables that satisfies all the outputs 
of the circuit and all additional constraints. 

* This work was supported in part by SRC contract 2003-TJ-920. 

R. Alurand D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 519-522, 2004. 
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2 Objective-Driven Decisions 

The speed of SAT solvers depends critically on their ability to choose good decision 
variables. In BMC, while the unrolled transition relation is always satisfiable, even after 
propagating the initial states, adding constraints on the target states makes most SAT 
instances unsatisfiable. CirCUs assumes that good decisions concentrate on proving 
that the objective cannot be satisfied, where the objective is an assertion on one of the 
outputs of the AIG. (If that assertion can be satisfied, then a counterexample is found.) 
Hence, decision variables are chosen from the cone of influence of the objective. 

Even though the transition relation is satisfiable, a SAT solver will encounter con- 
flicts in the attempt to derive consistent assignments for the inputs and outputs of subcir- 
cuits. These conflicts will produce non-objective clauses that express local satisfiability 
conditions for the transition relation and auxiliary objectives. Such clauses remain valid 
even when the objective changes as a result of further unrolling, and may prevent the 
search of fruitless parts of the solution space. When used as an incremental solver, Cir- 
CUs marks non-objective conflict clauses when they are created, and re-uses them for 
all successive runs and time frames. The clauses that depend on the objective, on the 
other hand, may be deleted by periodic clause deletion in CirCUs. Those that survive to 
the end of a run are “distilled” to bias the decision variable selection for the next run. 

While this mechanism is not more powerful than the ones used in other generic in- 
cremental SAT solvers [17, 5] in identifying conflict clauses that remain valid from one 
run to the next, it either reduces the time required to detect what conflict clauses remain 
valid, or it removes the constraint that all conflict clauses must be kept. Compared to 
[14], CirCUs can identify more conflict clauses that remain valid. 

3 Experimental Results 

We have integrated CirCUs in VIS-2.1. To show the effectiveness of CirCUs, we com- 
pare the performance of four versions of BMC on the VIS benchmark suite [16]. All 
experiments have been performed on a 1 .7 GHz Pentium IV with 1 GB of RAM running 
Linux. We have set the time out limit to 10000 s. 

- Case A : VIS-2.0 [3, 16] BMC interfaced with Zchaff [11] 

- Case B : VIS-2. 1 BMC interfaced with Zchaff 

- Case C : VIS-2.1 BMC interfaced with CirCUs 

- Case D : VIS-2.1 BMC interfaced with Incremental CirCUs 

We compare CPU times with scatter plots on logarithmic scale. The efficiency of 
optimizations like BDD sweeping and initial states propagation is shown in Fig. 1. In 
Case B, we run Zchaff on the CNF written from the optimized AIG. Since we use 
the same SAT solver for both case A and B, we can study the effects of removing 
redundancy. There are cases when redundancy gives an advantage in solving SAT. The 
score based decision heuristic may get a better initial decision order from redundant 
circuits. However, on average, redundancy elimination is very helpful. 

In Fig. 2, we compare Zchaff’s and (non-incremental) CirCUs’s speed on the same 
BMC instances. CirCUs shows consistent improvement over Zchaff. CirCUs achieves 
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the larger speed-ups on the harder examples (up to lOx). Incremental vs. non-incre- 
mental SAT in CirCUs are compared in Fig. 3. Incremental SAT shows improvement 
especially for the harder cases. Because an incremental SAT run starts with additional 
clauses transferred from the previous run, in small examples it may incur significant 
overhead. We summarize the overall improvement of CirCUs over VIS-2.0 BMC in 
Fig. 4. Each scatterplot shows two lines; The main diagonal, and y = k ■ , where k 

and p are obtained by least-square fitting. Student’s t test confirms that the improvement 
visible in each plot is statistically significant. 




Fig. 1. Effects of redundancy removal 
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Fig. 2. CirCUs vs. Zchaff 




Fig. 4. Overall gains 



Thanks to its performance and flexibility, CirCUs is suited for the modular develop- 
ment of abstraction refinement algorithms and other complex SAT-based applications. 
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Abstract. The functional verification of microprocessor designs contin- 
ues to represent one of the difficult challenges confronting the design 
of commercial microprocessors. In addition, test logic, transient errors, 
and power considerations complicate the problems by creating additional 
complexity and constraints on design solutions. Rigorious mechanized 
mathematics, often called formal methods, are being used to assist with 
functional verffication and its use has spread to ensuring that test cov- 
erage and power limitations are met. While the successes have been no- 
table, the wide-spread use of mathematical methods is still limited. Here 
we give a brief introduction to formal microprocessor verification, and 
then we present some scientific and engineering issues that need address- 
ing to bring formal methods into the mainstream. 



1 Introduction 

The specification and formal verification of microprocessors represents an evolv- 
ing science and a difficult to reach goal. There have been many efforts to verify 
abstractions of microprocessor designs, each with its own specific abstractions 
and detail. A number of related approaches have been reported, each with a 
slightly different twist, but in spirit similar. We discuss how we have parti- 
tioned the verification of a microprocessor, and make some remarks about why 
partitioning seems essential. This discussion will transform into a vision that 
an integrated capabilities a general-purpose verification system should possess. 
Finally, we present some research problems we believe need addressing if me- 
chanical mathematical formal methods are to become mainstream desigin tools 
for industrial-sized designs. 

We discuss several approaches that are typical for specifying the correctness 
of microprocessors. Even though many successful proofs have been done, there 
is no general agreement about what a suitable correctness statement is for mi- 
croprocessors is, especially super-scalar, pipelined microprocessor designs that 
include memory delays, exceptions, memory management, and external inter- 
rupts. We discuss why the correctness specification of a processor possessing 
external interrupts can be particularly vexing. Instead of attempting to sum- 
marize all of the outstanding work that has been done, we refer the interested 



R. Alur and D.A. Peled (Eds.): CAV 2004, LNCS 3114, pp. 523-533, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




524 



Warren A. Hunt, Jr. 



reader to Aagaard, et. al., paper A Framework for Microprocessor Correctness 
Statements [1], where an extensive reference list can be found. 

To illustrate the kinds of integrated capabilities that a general-purpose verifi- 
cation system should have we describe a hypothetical verification tool FMaAT 
(Formal Modeling and Analysis Tool) . Our description of FMaAT includes an 
integrated environment of language, database, checkers, provers, and autonomic 
regression and analysis routines. From our perspective, it is key that all of the 
design information and design meta data be represented in a unified database 
so that all manner of analysis can operate on the model of the design. 

The excellent work performed by the formal verification community has 
shown the possibility of verifying microprocessor models, but we are far away 
from being able to apply our mechanical tools to complete industrial designs. 
Of course, we use mechanical tools all the time, and some, for instance, Boolean 
equivalence checkers, are regularly used on very large parts of modern designs. 
But we still seem far away from being able to process a complete high-level for- 
mal description of a microprocessor which includes all of its associated safety and 
liveness properties properties and (bi-)simulation relations that ultimately need 
to be checked on the actual RTF design description or an equivalent abstraction 
of the transistor-level design. If mechanical tools based on formal methods are 
to be used broadly, then it must be possible to compile and build a model of all 
specification levels. 

We summarize a correctness statement that is sufficient to specify the cor- 
rectness of a processor with exceptions, supervisor/user modes, memory delays, 
branch prediction and external interrupts. We present this specification in Sec- 
tion 2, and we discuss hierarchical verification in Section 3. We present a snapshot 
of what kinds of information that should be included with a design specification, 
and we argue in Section 5 that complete design data be available to all tools 
for manipulating and analyzing designs. We postulate what characteristics the 
FMaAT system should include to permit the wide-spread use of formal verifi- 
cation techniques in the design process. In Section 6 present some engineering 
obstacles that need addressing to ensure that formal analysis techniques are 
suitable for general use. 

We note that this paper was written to provide some background for a micro- 
processor verification tutorial given by the author at this conference. Therefore, 
this paper is written in a conversational style, and it is provided as a partial 
record of what was presented and as a basis for further discussion. This pa- 
per was written not to explore a single topic in depth, but to present a vision 
for microprocessor verification techniques and tools. We conclude by describing 
areas of research we think need attention if formal mathematical verification 
techniques are to become mainstream. 

2 Correctness Diagrams 

The specification of correctness of a microprocessor can itself be subtle. This is 
obviously critical as it does no good to prove a vacuous or flawed theorem. In 
fact, there isn’t general agreement about what suitable correctness is. Instead of 
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attempting to justify a particular approach, we present several correspondence 
diagrams and conclude why we chose an approach we took. 

Consider the correctness diagram in Figure 1. This diagram is meant to 
indicate that a micro-architecture (MA) design requires some number of steps to 
execute one ISA step; this is a slight generalization of Burch and Dill’s approach 
[5] as it do not restrict the MA to only a single step. This was the correctness 
diagram was used to state the correctness of the FM8501 microprocessor [8]. This 
diagram, together with induction, can be used to show the correctness of any 
finite sequence of instructions. Such relationships between single and multi-step 
sequences has analyzed with the Microbox Framework [2] . 

The diagram in Figure 1 shows the ISA state being a proper subset of the 
MA state, and the correctness statement indicates that the MA state can be 
projected into a corresponding ISA state at certain points. The arrow mov- 
ing from MA states to the ISA states are projection (abstraction) functions. 
In non-pipelined microprocessor implementations, these abstraction functions 
are simple projections but in the case of pipelines microprocessors, the MA im- 
plementation is often used to flush (or retire) in-flight instructions so a simple 
projection function can be used. 




Figure 1 is not general enough to verify processors that have non-determi- 
nistic external interrupts. What is the problem? In a modern microprocessor, 
(almost) all in-flight instructions are immediately flushed to keep the interrupt 
latency small, and to prevent a subsequent instruction from creating yet another 
exception. Consider an external interrupt event interrupting the first MA tran- 
sition in the sequence of MA transitions. Normally, flushing from a particular 
state would permit the in-flight instructions to complete, but with an interrupt 
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most, if not all, in-flight instructions are flushed without completion. The typical 
Birch-Dill flushing process lets all in-flight instructions complete before a projec- 
tion is performed, but with an external interrupt in-flight instructions that will 
normally complete from an earlier MA state will be flushed. If we then project 
the corresponding ISA state from the MA state just after an interrupt, this may 
actually produce a state which, in some sense, is earlier in time than the state 
produced by letting the MA implementation only finish in-flight instructions. 




Flushed 



Fig. 2. Super-scalar Microprocessor Correctness Diagram with Interrupts 

To provide for asynchronous external interrupts in the correctness state- 
ment, it is necessary for the ISA specification to accept external interrupts as 
well. That is, the different execution paths that can be taken by the MA-level 
machine must also be possible at the ISA level; otherwise, the ISA specifica- 
tion cannot be kept in correspondence with the MA-Ievel implementation. Such 
a generalized correctness statement is pictured in Figure 2, where bifurcations 
represent possible execution path changes due to interrupts. Of course, the de- 
sign of MA-level flushing mechanism must operate in such a manner that the 
diagram can be used. This diagram was the correctness diagram for the FM9801 
microprocessor verification [11,15-17]; the flushing mechanism in the FM9801 
essentially “throws away” partially completed work, so it can quickly respond to 
external interrupts, and this mechanism prevents the problem mentioned above. 
Even though we believe we used correctness diagram in a sound manner, Mano- 
lios showed that it is possible to satisfy this type of diagram with trivial, wrong 
implementations [12]. Correctness statements can be very subtle. 

In a companion paper [14] , we describe another correctness diagram suitable 
for microprocessors with sophisticated features and external interrupts. All these 
correctness statements are designed to be hierarchical. We can construct another 
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commuting diagram on top of these correctness statements, building a stack of 
verified layers [3]. In the next section, we discuss breaking down an individual 
commuting diagram proof into pieces. 

3 Compositional Verification 

The internal designs of modern microprocessors are very complex. In fact, the 
designs are so complex that attempting to just use symbolic simulation to satisfy 
one of the correctness specifications given in the previous section will not work. 
Instead a proof has to be broken down into smaller pieces so that can be later 
composed to produce a complete proof of the desired result. 

There are many ways that a proof can be broken down, but whether the 
sublemmas are either single-step properties (or invariants) or multi-step prop- 
erties certainly changes the ease with which the sublemmas can be composed. 
Single-step properties are easy to compose - so long as state-space restrictions 
are met. Multi-step properties are often much more difficult to compose because 
their environmental requirements and also their (possibly multi-step) conclu- 
sions may be very difficult to “stitch” together. Why? The assumptions (usage 
environments) can be very complex, and it may be difficult or impossible to 
satisfy both sets of assumptions simultaneously. 

Composing two or more individual multi-step properties, or other multi- 
step invariants, into a single lemma may also produce a conclusion that itself 
is so specialized that is may difficult to state or prove. For instance, consider 
composing a four or more step property with a two-step property. Just how 
should the environments be aligned so the lemmas can be composed? Should 
the conclusion be about two steps or four steps? With single-step invariants, 
this is a much easier task as only environmental restrictions on the (reachable) 
states need to be considered when composing results. We used such single-step 
invariants to prove the correctness of the FM8501, FM8502, FM9001, Motorola’s 
Complex Arithmetic Processor DSP, and the FM9801 microrprocessors [8,9,4, 
10 ]. 

Industry has equiped itself with several kinds of FSM exploration tools, most 
notably model checkers and (G)STE engines. These tools have been used to great 
advantage, exploring various particular design questions in great depth. However, 
these tools can only be practically applied to subsets of modern microprocessor 
designs. Generally, the results from these tools provide multi-step property veri- 
fications, and as such, are very difficult to compose. This is a critical issue. When 
automatic FSM exploration tools are available, they can be put to profitable use 
on many parts of a design, but as the designs grow, and the number of different 
properties grows, the re-assembly process become extremely complicated. 

We believe, if properties verified with FSM exploration tools are to be com- 
posed, then either a high-level theorem prover should be used to decompose the 
proof obligations into pieces that can be checked in a manner similar to McMil- 
lan’s [13] approach or FSM property tools should be used to prove one-step 
invariant properties. In this way, the discoveries made with FSM exploration 
may be safely composed. 
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Even in light of the difficulties of composing results derives from various 
FSM exploration tools and checkers, we note that these tools have discoverd 
many design flaws and checked many complex properties. We have no doubt of 
the importance of FSM exploration tools as they are used today. These tools are 
productive and they will continue to be important, and the use of these tools 
has help defined what the state-of-the-art-of-the-practice is. 



4 Current Practice 

The state-of-the-art-of-the-practice of the use of formal mathematical methods 
with mechanically-implemented implementations varies from design to design. 
My impressions with the industrial use of formal methods comes from time I 
spent working for IBM Research in Austin, Texas, but through conversations 
with my colleagues at other companies, my experiences seem typical. 

The application of formal methods to design projects varies widely, for many 
practical reasons: size and importance of project, available tools and people, 
confidence of architects and managers in the available technology, degree of in- 
tegration of formal methods tools in the tool flow, size of the company, and 
duration of the project. The use of formal methods is primarily centered in a 
dozen or so of the largest companies, probably due to the cost of creating a crit- 
ical mass of infrastructure (meaning tools, people, and design practice). Even 
in the larger companies, there are great differences in the degree to which such 
techniques are deployed. Many smaller projects did not use formal methods, 
while in larger design efforts, there may be a dozen or more people. 

From the project management point of view, the use of formal methods 
represents a vexing challenge, due to the lack of available metrics to know when 
its use is efficient and sufficient. This is not so different from simulation, but 
over many years management has developed “coverage” metrics that help them 
gauge when there has been “enough” simulation. For complete designs, such 
intuition has not been developed for mechanical mathematical methods. We are 
optimistic that coverage metrics can be defined because with a proof, it is very 
clear what has been proved and what has been assumed. 

5 Challenges 

We see a number of challenges to improve mechanical mathematics tools so 
that they are regularly used on commercial-sized designs. First, we recognize 
that such tools are already being deployed, and are regularly being used to 
examine large parts of modern designs. For instance, equivalence checkers are 
now being regularly being used to ensure that low-level gate and transistor-level 
design specifications implement Boolean RTF micro-architectural descriptions. 
In addition, model checking [6] and (G)STE [18] are being successfully to validate 
various design elements. Theorem proving systems have been applied in niche 
areas, such as floating-point algorithms, and they may provide the “glue” to 
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bind together different tools. Even so, we don’t think of formal tools being part 
of the “model build” that is often done each evening during the design process. 

After developing and using mathematical modeling and analysis tools in a 
commercial environment, we have identified several challenges that we believe 
need to be addressed before mathematical analysis will regularly occur at the 
higher levels of the design hierarchy. So, if we were to build a system called 
FMaAT (Formal Modeling and Analysis Tool) , we would like it to have a num- 
ber of properties. 

— FMaAT needs to be able to read, compile, and “model build” the entire 
design specification. FMaAT should be able to read the entire design, and 
represent such a design as a formal object. Engineers do not trust tools that 
cannot manipulate the actual design specification. If one were to print the 
complete RTL description for a modern microprocessor, it may well require 
30,000 or more pages. 

— The FMaAT system must, in all respects, operate in a hierarchical manner. 
Every design is yet just another piece of an even larger design in the future. 

— FMaAT needs to contain all embedded annotations. That is, if a module 
has a requirement that its inputs are one-hot and that its outputs are active- 
low, then this data must be included in the original design specification and 
it must also have a representation as a formal object that FMaAT can 
inspect, manipulate, and subject to analysis. 

— FMaAT needs to be able to act as a database engine that allows every 
design module and interface to be identified. It must be possible to uniquely 
identify every primitive element, interface, and wire. A completely precise 
and unique naming convention is a requirement. 

— Each time a change is made to the design, the effects of the change should 
be automatically propagated to the FMaAT database so when an analysis 
is requested, only the relevant parts are subjected to analysis. 

— FMaAT needs to be able to compute cones-of-infiuence, bus conflicts, im- 
proper connections, and other user-definable queries. 

— FMaAT must have a command-line interface. In a big design project, tools 
are always “taped” together with scripting languages to overcome deficien- 
cies in the design flow. FMaAT’s command-line interface should itself be 
described formally. 

~ There has to be a way to re-run all checkers, simulators, etc., automatically 
whenever there is any change to the design. Automatic regression verification 
is a must. 

— There must not be any way to get a false positive. There must be provisions 
for ensure vacuity checking for analysis requests. 

— If possible, FMaAT should have some kind of analysis “coverage metrics”. 
If included, then a formal definition of coverage should also be included, so 
that some kind of qualitative assessment can be made as to the thoroughness 
of an analysis. 
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— Along with having all of the design and associated property specifications 
directly available in a single database, it is critical that there be a semantics 
that allows the various tools (and the results derived from these tools) to 
be safely composed. This can only be done if FMaAT contains a general- 
purpose theorem prover. 

— FMaAT must provide a means to write a truly rigorous high-level specifi- 
cation. System-C and System Verilog are not a long-term answer; in fact, 
these languages are creating yet more problems. 

— A purely functional verification system is not sufficient. FMaAT must a 
way to specify non-functional properties such as power requirements, circuit 
sizes, wire types, physical location data, environmental requirements, and 
other critical design properties. And for each such property, suitable checkers 
and verifiers will need to be provided. 

— FMaAT must deal with a distributed design process. No project of a sig- 
nificant size is all done in a single place. 

— Finally, FMaAT must safely extensible; that is FMaAT should be no more 
difficult to extend than Emacs, but FMaAT should impose a discipline that 
ensures that extensions do not render existing checker and verifiers unsound. 

A tool like FMaAT will require a much more general language than those 
commercially available, such as Verilog and VHDL and their derivatives. The 
limitations of the available languages are causing the specification problem to 
actually become worse because designers are forced to record their design prop- 
erties as comments or in external files. The available existing design languages 
do not have associated specification languages. A community- wide effort has re- 
sulted in the Accellera [7] standardized property specification language, but even 
this language does not have a formal relationship to the systems (e.g., designs 
coded in Verilog or VHDL) it is meant to specify. 

Future system design languages need to have fully integrated specification 
languages and fully integrated annotation languages. In this way, the analysis 
tools (checkers, simulators, theorem provers) can all get access to any or all of 
the design artifact, thus providing a unifying framework for designs and their 
specifications. And all such analysis tools must be defined using the same se- 
mantic foundation so results from one analysis tools can be immediately used 
by other analysis tools. 

6 Research Problems 

There are many technical and engineering challenges that remain before mechan- 
ical formal mathematical methods become fully integrated into the commercial 
design flow. We discuss these obstacles with the hope that our community will 
help solve these problems. 

To make a system like FMaAT will require fundamental changes to the 
infrastructure of commercial design environment. With careful planning and 
execution, it should be possible to incrementally improve commercial design tools 
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so that their foundation is suitable to allow the wide-spread use of mechanical 
formal methods. 

— New formally specified design and annotation languages need to be defined 
that provide a semantically unified framework for designs and all associated 
specifications. These languages need to include mechanisms to represent all 
of the design “meta” data directly as a part of the design specification. For 
instance, that some inputs are “one-hot” and some outputs are “active-low” 
needs to be captured just as some safety or liveness property. All of this data 
needs to be expressible in this language. Such a design language also needs 
to be general enough to express non-functional properties such as area and 
power constraints. These languages must be hierarchical. 

— To reduce power consumption, there is going to be a greater use of asyn- 
chronous circuit elements along with circuits that can trade execution speed 
with power requirements. Specification and analysis of mixed circuit types 
will be necessary, and we need to develop modeling and verification tech- 
niques capable of supporting designs with a mixture of digital, asynchronous, 
and analogue circuits. 

— Typical two- and four-valued simulators need to be extended to symbolic 
simulator; that is, there should be a single simulation environment general 
enough to perform simulation with a mixed set of constants and symbolic 
variables. Moving from one simulation environment to another is error prone 
and confusing. 

— All of the analysis tools (e.g., equivalence and model checkers, (G)STE en- 
gines, reachability analysis, theorem provers) should all read the same design 
data and all follow the same semantics. In other words, we must achieve in- 
tegration between the various analysis tools so results from one tool can be 
used elsewhere. Implementations of these tools have chosen their own logics; 
we need some kind of unification so results can be shared and reused among 
the various tools. 

— A new suite of non-functional checkers (e.g., for power, area, redundancy 
management) need to be developed. These checkers should receive the same 
level of rigor and development as existing formal analysis tools. 

— Post-silicon design approaches need to be integrated into the design process. 
Post-silicon debugging tools (e.g., logic analyzers) have not improved much 
in the last decade and the amount of visibility continues to decrease as 
implementations become more and more integrated. Future formal design 
languages must be general enough to also permit the specification of the 
supporting chip sets and the systems themselves. 

— Autonomic systems that automatically (re-)run all checkers and provers 
should be automatically started any time any part of a design is changed. 
This is necessary as no large design effort is now done in a single location. 
This system, if you will, is a super-CVS, ensuring that all design properties 
are pro-actively analyzed. 

— Formal approaches to (microprocessor) security need to be developed. Fu- 
ture processors will be systems-on-a-chip, and the specification of security 
features and analysis of security properties is going to be critical. 
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These are some of the issues that need solving to broaden the impact of math- 
ematics on the design of microprocessors, and computing systems in general. A 
sustained, long-term effort will be required to extend the state-of-the-art-of-the- 
practice. 

7 Conclusion 

The functional verification of microprocessor-sized designs will continue push the 
state-of-the-art and the state-of-the-practice of mathematical formal methods. 
As the complexity and sheer number of microprocessors continues to increase, we 
see no practical alternative to the use of formal mathematics supported by me- 
chanical reasoning tools. Mathematics is the only technique that can scale with 
the ever increasing size and complexity, and mechanized mathematical specifi- 
cation and proof are the only practical infrastructure for correct, reliable, and 
secure microprocessor. 

We would like computing systems to be specified by a formula manual, a com- 
plete precise set of formulas that exactly specifies computing systems (whether 
hardware, software, or both). We want mathematically specified, mechanically 
checked computing systems. Systems are increasing in complexity faster than 
our ability to manage them or control them. If we are aggressive, maybe we 
can achieve this vision on small commercial designs, e.g, cell telephones, pagers, 
routers, etc. Our ability to field secure, correct systems is based on our ability 
to specify and validate our computing, networking, and control systems. 

The use of mathematical formal methods will continue to broaden. It is the 
most economical method to ensure correctness, reliability, power usage, and se- 
curity, of future designs. No other analytical techniques known to us will be able 
to scale with future design requirements. We are impressed with the progress and 
we look forward the challenge of extending the use of mathematics for design. 
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